What are we working on? The Balena Notes

It’s been more than 2 years since this thread was updated and so much has changed since then.

Back in 2020, we moved our hosting infrastructure from running on a CoreOS fleet to Kubernetes. This made it smoother to roll out updates and made things recover faster in case of incidents. We follow a GitOps approach using Flux for releasing updates so any changes in the environment are version controlled. I am now migrating the clusters from using Flux v1 to Flux v2 which should reduce the number of API calls to Dockerhub for obtaining the image tags. We would be running production on Kubernetes for a while as we work in making balenaOS :balena: easy to deploy to the cloud.

Would you like to share some tidbits about that new CI that we are building, @Hades32 ?

6 Likes

Thanks Carlo!

Since joining Balena at the beginning of this year, I’ve been pushing forward a CI system that will make our lives a lot easier! We have 100s of repositories with many cross dependencies (because we reuse as much as possible to do a lot with a small team), we build server apps, CLIs, desktop apps and operating systems - and a lot of that for multiple architectures/machines!

Our so called “Transformers” build architecture is all based on the idea of having many small building blocks with opinions on how to build a specific thing into one or many other things. The big difference to GitHub actions and other similar systems is, that every “thing” has a type and well-defined meta-data, so that re-use becomes a breeze and, in fact, automatic!

Besides that, I’m currently trying to get my Pine64 to run BalenaOS by contributing there. Yes, doing work all across the company - if you want to! - is definitely a thing here :slight_smile:

And as he’s helping me with the latter, I nominate next @klutchell !

6 Likes

Thanks for the nomination @Hades32 !

As part of the OS team I’ve recently been focused on a new suite of automated tests to reduce the friction of releasing new versions to production. This involved reviving an old framework (not alone – with a small team) and rewriting a lot of tests that didn’t align with the current state of the OS.

It’s no surprise that some of the improvements to testing have also resulted in improvements in the OS itself!

On top of that I’ve taken over maintenance of our balena-preload module as used by the balena CLI. We have some cool ideas planned for other ways we could use this module in production but first it requires some TLC and I’ve been doing my best in that regard.

The rest of my time recently has been split between docs, labs, and devops, helping out wherever my experience takes me. Today I’m trying to get proper docker buildx support into balena-ci and I’m learning about a ton of backend components I’ve never seen before!

I’ve been here ~10 months and before I started I told myself that I would get balenaOS running on my RockPro64 and I haven’t even cloned a repo yet. Kudos to @Hades32 for going as far as he has with Pine64!

I’d love to hear what my partner in docs, testbot, and general shenanigans @vipulgupta2048 has been up to!

7 Likes

Thanks for the ping @klutchell but before that thanks for being the awesome dude you are and bringing your best each and every day. Actually, that’s true for all my balenistas, I love y’all.

As part of the testbot team (R&D), my primary focus is automated hardware testing.

  • The hardware aspect of it is the testbot.
  • The software aspect is Leviathan, the distributed testing framework we are developing/reviving.
  • The research aspect is finding alternative solutions to hardware & software problems, test optimization, or even debugging tricky issues with several variables at play.

I work with the hardware and on the software with the whole team pitching in with their respective expertise. I spend my time improving Leviathan’s UX, overall architecture, maintaining testbotSDK, developing new features and new test suites. I am also one of the remote caretakers of our testbot rig making sure it’s in top shape with regular updates, attention to logs, and in-depth debugging.

My secondary focus is docs. With the product surface being as big as balena, docs are nothing short of their own challenge. The team is collectively taking it head-on especially with the release party we organized recently. I work on new docs initiatives or major changes that need to be planned & executed.

The rest of my time is split between projects, support, self-learning, and running memeservice at balena. Balena is actually my first job after college and what an adventure it has been through and through. And, to continue telling you about the adventure that is balena, I like to nominate my fellow testbot caretaker, and recent balena Hack Week winner @rcooke-warwick :trophy:

4 Likes

Thanks Vipul :slight_smile:

My mission at the moment is to make automated testing on real hardware a reality.

  • This means working on the hardware (testbot) - that we use to remotely control, provision and power devices under test (DUTs)
  • Creating automated tests to replace manual ones being done at the moment, as well as adding new test cases
  • Working on the testing framework to remotely test things like OS releases on real hardware
  • Making use of these testing tools for other applications, such as hardware qa

Over the past 6 months, the focus has been on stabilising our testing stack, and replacing the need for manual testing of OS releases - we’ve managed to create an automated test suite that is essentially functionally equivalent to the set of manual tests, and the current challenge is scaling this up so it gets run for every device type! (currently its only running on a rig of ~10 devices from the Rpi family)

Another area I’ve worked on is the hardware development process - A process that aims to reduce friction in developing hardware, reducing iteration times.

I’ll nominate @anujdeshpande next!

5 Likes

Thanks @rcooke-warwick !

I am working on hardware related things at balena, with one side project that I sometimes get back to :slight_smile:

  • I am working on the software that will go on the Fin 2 as part of it’s onboard microcontroller’s firmware w/ AlexB.
  • In the last couple of weeks, I have been focused on improving our hardware process. Hardware development has traditionally been a slow (compared to software). But there are some obvious improvements that we can make to fix that. We are essentially building a CI that is suitable for hardware projects like Fin, Etcher Pro, their respective accessories, and more! So this meta project aims to take some of the work that hardware designers do, but don’t need to. Ryan, Nico, Konstantinos and AlexB are working with me on this :slight_smile:
  • I started working on a small project w/ @andrewnhem to enable SSDs on the Raspberry Pi 4 running balenaOS. This would mean we have better storage than SD cards! Great for projects like NextCloud. If anyone’s interested to help in on that, let us know!

There’s loads to be done to build the next generation of hardware, as well as improving the process of developing hardware itself!

I nominate @phil-d-wilson next!

5 Likes

“Thank you” @anujdeshpande ? :slight_smile:

I’ve got my fingers in lots of pies at balena including improvements to exposing device ports to the internet, mDNS to discover and use balena devices, and local device management for open fleets. But the main focus of my work is currently balenaBlocks and helping to lead the balena labs.

Blocks are a form of app enablement, the idea being to give developers (of all abilities) little parcels of functionality that they can drop into their IOT/edge application. My current work here is paving the way for everyone to build and contribute blocks, to build the ecosystem. To do this I’m writing blog posts and documentation, speaking with other companies who want to make blocks for their products (watch this space!) and pushing improvements to how we expose blocks on hub and allow users to deploy them.

The labs is balena’s team of pet users. We do what our users do: make projects, use the product, find areas to improve and hack on stuff. Lately we’ve started to induct all new balena joiners into the labs for a residency, where they make a project and tell everyone about it. The residency allows people to come into balena and work on their own idea, with zero pressure, whilst they get to know the product and the team. It’s a great way to absorb the unique culture of balena and learn how we work. I’m super excited about the projects the first batch of labs residents are currently working on, and can’t wait to see their project blog posts!!!

I nominate @20k-ultra to share next - he’s always got interesting things on the go!

:heart: :balena:

3 Likes

Ah, what a cool thread! Thanks for the ping @phil-d-wilson.

Well, I focus primarily on the Supervisor which is the application on device that manages services and configurations (this trivializes the complexities of our target state funnel algorithm). This project is really intricate and has worked from the beginning when Balena was just deploying single containers, to multi container applications now and in the future multi-applications. These paradigms have different requirements for the features we offer so being reliable is our #1 priority. The team have done a lot of work improving test coverage, and even better we’ve made some tooling to sanely mock docker engine in our tests using a lib called mockerode that we’ve built. I’ll let @pipex talk about that more when he gets pinged and we are really excited to improving the Supervisor’s E2E testing with testbot!

So, aside from continuing to polish the Supervisor, we have a spec in the works for a fallback state. This allows the Supervisor to know when it’s time to give up applying the target state and fall back to a reliable state known as the Goldilock State. It’s very important that we don’t break the universal law that the Supervisor listens to what the target state wants so this spec is pretty different in what we’ve normally worked on. It hopes to prevent devices trying to run a state even when it knows that it has been failing for hours.

In the spirit of reliability to be able to build good testing you need to automate some stuff. A bunch of people are working on a big change to our release model which allows first class versioning and marking a release as draft or final. With this new model we can then make pipelines that push draft release in a pull request and then mark it as final on merge. I’m in charge of the github action which will offer this workflow in a very easy to use package. In fact, we’ll be using this ourselves with the Supervisor! Imagine having designated test devices in a fleet that will run the draft releases but the rest of your production fleet only tracks final versions. It will all happen automatically via your git workflow.

Oh, I also answer lots of security reports! We do a great job of answer all reports and always get back to you. If you think you’ve found something, reach out to support@balena.io and checkout our security page: Security - Balena Documentation.

As per most people that work at Balena, we have a lot of projects going on since we are encouraged to help where-ever we find that we can provide value to the team. This list of things I’m working on are just the main ones I focus on, there’s a lot of stuff behind the scenes I’d like to get to eventually like an interactive flow chart for debugging devices. Intended for our support agents but let’s make it open source and allow anyone to use it!

I’m going to ping @codewithcheese because I always learn a lot when talking with Tom and he’s always working on some super meta project.

5 Likes

Thanks @20k-ultra you’ve done amazing work with the supervisor!

I am currently focusing on building a next-generation DevOps tool, to automate the complete pipeline from source code change all the way through to product creation and deployment.

In addition to balenaCloud, there is openBalena, and we also offer a version of balenaCloud for on-prem customers. Each of these products is composed of different combinations of the same components, and they are all deployed in different ways. Keeping all the product and components in-sync currently requires many actions by the team, can take time, and each step can introduce bugs.

To be able to provide the most stable products possible, we are working on automating the complete process. So when a balenista or an outside contributor pushes new source code, that code can be tested, incorporated into all the relevant products, those products can be tested and automatically release and deployed.

A number of concepts have been developed to automate this. First, is contracts. Contracts define a component and its dependencies and help our systems reason how components can be composed together. Second is transformers, this is our CI pipeline on steroids. It can automate all the steps by reacting to changes in contracts. Finally, is Katapult, which I am working on, it can combine low-order contracts, such as a service, into higher order contracts, such as a product. The higher-order contracts are then used to build product releases, development environments, and deployment configuration.

Note: there is an open source Katapult repo. However it is a basic prototype which doesn’t represent the concepts in development now.

Next I nominate… @Ereski i’m curious what magical solutions your working on next!

3 Likes

Thank you @codewithcheese! Unfortunately, nothing magical at the moment :joy:

I am currently dirty with virtual grease, working at the heart of our internal management system, Jellyfish. Recently we deployed a partial redesign to our database schema that yielded considerable improvement to queries we routinely run in Jellyfish. Better UX and less $$$ required by infrastructure are always good :smiley:

I am deep in our CI/CD pipeline. In balena, and in special with Jellyfish, every PR that is merged creates a new version that is then, ultimately, deployed in production. There are several steps in this pipeline, but a single change may take a few hours to be fully tested and make it to production. There has been a lot of work in improving that time, but it is still not near comfortable. Trying to deploy urgent fixes, in special, can be troublesome. The goal of Jellyfish itself is to reduce friction for everyone in balena, so it is a twist of irony that there’s quite a bit of friction to work with jellyfish itself.

Apart from that, it’s mostly grunt (but necessary) work at the moment. Might do magic later :stuck_out_tongue:

I nominate @lucianbuzzo. I’m sure you have a lot to say after more than two years since you last wrote here.

2 Likes

Thanks for the nomination @Ereski !
It’s pretty wild reading my last post in this thread from Feb '19 - we actually released the schema form component in rendition, which you can see here.

As has already been mentioned above, the cycle time on Jellyfish has been painfully slow and improving the situation has been my number 1 priority over the last month or so. Slow development and CI/CD times get exponentially worse, if a test takes a long time, you can end up tackling some other minor task, or get distracted from your main task and this context switching eats into your productivity. Additionally, if you have flakey tests, then the CI/CD run might fail without you realising it, but you’ve already been distracted by a minor task and didn’t notice the failure, so don’t immediately re-attempt the CI run. Before you know it, you’ve spent 4 hours effectively waiting for loading screens!
For any project to be successful, I think it is essential that the software remains malleable (easy to change). At the very least trivial changes must be trivial to implement. Once it becomes difficult to change software, it begins to atrophy, refactors become enormously difficult tasks and everyone hates working on the project. The end result is often not catastrophic, but insidious. Because small changes and fixes are difficult and time-consuming, they aren’t done (engineers look for a bigger “bang for their buck”) resulting in the system dying “a death of a thousand cuts”, as all the small problems go unfixed and the software becomes unusable.

So with all this in mind, we committed ourselves to bring down the complexity and cycle time attached to making code changes, and we’ve been able to dramatically improve the reliability of our CI runs as well as more than tripling the speed of them! We were also able to identify many parts of the system that could be refactored to reduce the amount of code required and the number of module interdependencies needed, which is next on the roadmap and something we’ve already started work on.

Outside of working on Jellyfish, I’ve also been spending a lot of time designing and codifying our hiring process. Like everything at balena we really tried to approach hiring from first principles and to build it as a product, and I think we’ve done a lot to not only help us hire better candidates but to also provide them with a much better experience, whether they join the company or not!

Many software companies have hiring processes that optimise for hiring people who have large egos and/or are good at competitive programming, rather than people who would be great team members.
Rather than trying to emulate how Google or Facebook hire, we talk a more focused approach, applying first principles thinking to understand our hiring process better. I really wanted to create a process for hiring engineers that allowed candidates to showcase their logical reasoning, problem-solving, critical thinking and communication skills, whilst being open-ended enough to not constrain people to predetermined answers. I think there is still a lot of unexplored areas to explore, but what we’ve achieved so far has been very promising :blush:

@fisehara What have you been up to?

By the way, we’re hiring!

2 Likes

Thanks for the nomination @lucianbuzzo !

I’m lacking any audit track in the forums, as I’ve joined Balena’s backend team in May and am still onboarding into backend related code bases and especially the Balena API. Our Balena API mainly builds on top of the open source projects open-balena-api and pinejs where the latter one is the foundation and provides a business data model generator, a just in time SQL query generator / executor and an OData API to query the data model. Right now this elegance of letting pinejs provide resource endpoints at runtime without implementing them manually comes with a higher complexity of backend implementation and pretty much outlines my daily challenge.

Besides working and onboarding into the Balena API codebase I’ve started on figuring out how to auto generate the Balena API documentations. Which includes some intermediate steps namely a balena API model to OData specification generator, an OData specification to openAPI specification converter and finally a rendering step of openAPI specification to a hosted documentation page. I’m undergoing evaluations to utilise the oasis OData-to-openAPI converter and openAPI rendering tools like Redocly.

For the Balena build process I’ve pushed forward a feature for our Balena builder to support env_file tags in the docker-compose file. Laying the foundation is done and now I’m working on integrating the feature into our cloud builder codebase. Finalising this feature includes integrating it into the Balena CLI

During our recent hack week I’ve been working on utilising weaviate to add a semantic search engine as an augmentation for Jellyfish. This tool transfers Jellyfish entities and indexes them in a vector space database. From this database similarity queries and distance metrics can be evaluated by just measuring the euclidean distance between entries. Maybe it will help us understand our knowledge base better and supports empirical decisions.
Hopefully I’ll keep on working on this to ship it as a real augmentation, that everyone in our team can use it to explore our knowledge base Jellyfish.

Last but not least, I’m starting some real hardware hacking with a Raspberry Pi, the Raspberry Pi Sense HAT and a stereo microphone HAT. As I’m a hobby saxophonist who hasn’t played for a decade now I need to train my embouchure. So I came up with the idea, that the Raspberry Pi will show a note to play on the Sense HAT and measure the deviance to the played tone via the microphones. Maybe this evolves into a real play along trainer, let’s se how things are going.

I’m calling out for @markcorbinuk to give us insight into the depths of hardware bringups - How is RISC-V doing? :slight_smile:

1 Like