Kubernetes for scaling

chriswiggins · June 17, 2020, 8:56pm

Could you use another ingress, purely for internal load balancing, and use a different domain to get around the fact you’d have two ingress controllers?

I.e make the second ingress a clusterIP?

bversluijs · June 17, 2020, 8:58pm

Do you have an example on how to implement this? You can message me if you want or chat. I’m happy to try this out!

bversluijs · June 18, 2020, 9:37am

@richbayliss I’ve made some progress!
I’ve got the inter-container communication working. I had to add a hostname to the LoadBalancer (see here for DigitalOcean. Now I can communicate from pod to pod via https, which is great!

I’ve also added another LoadBalancer for the VPN with proxy protocol enabled, and the device connects to it, which is great!
Now I have a new problem, unfortunately. I was trying to SSH into the device via a tunnel (balena tunnel). First I saw some 400 bad requests to /health, so I checked that out. It seems like the HAProxy configuration doesn’t like PROXY requests, so that’s problem 1 (I’ve added accept-proxy after the port 80 bind, which seems to work). Only, the 3128 doesn’t like the PROXY headers at all.

I’ve tried to add a frontend for port 3128 in the HAProxy, but this doesn’t work because the connect-proxy already listens on this port (obviously). But, right now, it’s impossible to say: 443 uses the PROXY protocol, but the rest doesn’t. Because that’s not the way a LoadBalancer works. And adding another LoadBalancer will cause DNS problems.

Maybe adding an extra HAProxy Ingress instead of a plain LoadBalancer will fix this problem, it only adds some overhead because an extra HAProxy has to run. But I’ll look into this…

This works! I’m only missing port 80 of the VPN now, but it seems like everything is functioning just fine! Only problem is, it needs an extra Ingress with a custom configuration, so I’ll have to look into this if I can implement it in the basic k8s build…

richbayliss · June 18, 2020, 10:27am

Great news! I’m stoked that you managed to get this working, and I think I am right in saying that you’re the first person to get openBalena running in K8s

port 80 of the VPN

I dont think this is needed to have a working stack… certainly I cannot think of anything which will break as a direct result.

bversluijs · June 18, 2020, 1:20pm

Nice to know!
I’m creating a Helm script, which is almost finished, to get openBalena started in a heartbeat on K8s, and making it easy to configure (like replicas etc.).
I’ll post an update if it’s finished and pushed to the repo!

bversluijs · June 18, 2020, 4:03pm

I think I’ve done it. Finally.

Disclaimer
It works on DigitalOcean, because they have a fix for issue #66607.
It probably works on AWS too, because they also have this fix. But haven’t tested it out.

I’ve created an Helm chart for open-balena, which creates all deployments, services and volumes. It also creates an NGINX Ingress for routing, as well as a HAProxy Ingress for the VPN routing. This means, it created 2 LoadBalancers. It’s necessary. Believe me.

I tried to create it that it’s easy to use if you know how to set-up the Docker open-balena.
You first have to execute the quickstart like you’d normally do. Use an existing email for the cert-manager.

$ ./scripts/quickstart -U <user-email> -P <password> -d <domain>

Then, it creates the settings.yaml in config/k8s/settings.yaml as well as the normal Docker config (config/activate).

Then you’ve to start installing it on your cluster like so:

./scripts/k8s install

This will install the open-balena chart, as well as cert-manager. Which handles your Let’s Encrypt certificates, via HTTP-01 in combination with the Ingress controller. Your super-user username will be used for emailing you if your certificates are about to expire but it can’t renew them.

After this command is done, everything is up and running on your cluster. You’re not done yet!
You haven’t set the DNS settings yet. But because you have 2 LoadBalancers, the DNS settings are somewhat changed. If you’re using DigitalOcean, they LoadBalancers are named k8s-openbalena and k8s-openbalena-vpn (guess which one is for the VPN).

So link the k8s-openbalena to the following domain names:

<yourdomain>
api.<yourdomain>
s3.<yourdomain>
registry.<yourdomain>

Link the k8s-openbalena-vpn to the following domain:

vpn.<yourdomain>

You’re all set. After a few minutes, the cert-manager will have the certificates ready and applied to the NGINX Ingress controller.

Upgrading / applying other settings can be done like so:

./scripts/k8s upgrade

Last, but not least, take a look at config/k8s/settings.yaml.
All your settings are placed here. You can change the replicas per deployment, the storage size per PersistentVolumeClaim, set Sentry DNS’s, use an external S3 storage and check out your username / password if you forgot it.

I’ll be using this in the next couple of days, and notice that the k8s support is not official. Most likely there’ll be changes about the k8s support and settings. So keep that in mind.

Known issues

The API pod doesn’t run on the first boot. This is because it’s booted faster than the database pod and error’s with an CONNECTION_REFUSED. The pod doesn’t crash however and it doesn’t try to reconnect again. I don’t know why the pod doesn’t crash or reconnect, so I’m creating an issue for that in the open-balena-api repo #383 - Crash on fatal errors. So after everything has spun up, delete the API pod, because this will create a new one which connects to the database.

nico · June 27, 2020, 8:12pm

@bversluijs this is awesome work, well done and thanks for your efforts! You spared me hours (if not days) of work to get this running.

I was able to test this on Kubernetes running on Bare-metal (single-node CentOS 7 test ‘cluster’) and can confirm this is working. NOTE: Metal LB is used as LoadBalancer on Bare-metal.

I have a few comments/suggestions and would be more than happy to get involved to improve on this.

When I build my k8s clusters I install cert-manager, nginx-ingress, cert-issuer, etc as this is a requirement for almost all services and I would like to re-use those existing services if possible. If I would run ./scripts/k8s uninstall it will remove cert-manager that is in use by other services on the same cluster.
Namespace: I think this should be variablized as people might want to install openBalena into an existing namespace or even default if that’s all that will be running on the cluster, alternatively one might want to run multiple instances of openBalena on the same cluster in different namespaces (dev, stage, prod).
Monitoring: Configure Prometheus + Grafana to monitor everything.
Logging: Some form of centralised logging for example Elasticsearch.
DNS (optional): Add External DNS to automatically create the records required by ingress-controller.
Cloud (optional): Add support for all major providers (AWS, GKE, Azure, etc).

bversluijs · June 28, 2020, 8:58am

Hi @nico,

Thanks for your kind words! It took me days to get it working because of all sorts of complications, but I’m happy to hear that other people use it!

Answering some of your questions:

You’re right. I’ve had this issue myself, but I’m trying to figure out what’s the best approach here. Add another command for cert-manager or just an extra instruction?
Couldn’t agree more. I’ve used the namespace because I thought that’d combine all in one, but like you said, you can have running multiple instances on the same cluster. So I’m changing that!
It’d be awesome to use Prometheus + Grafana to monitor everything. I’ve never worked with them so far, but I’m planning to. So feel free to publish on how to do it. However, it’s not necessary for open-balena to run. So I think having the option is nice, but not integrated in the open-balena Helm chart.
Same as above. The option would be nice, but not integrated in the open-balena Helm chart. Also, Sentry is integrated in some of the Node.js containers of open-balena, which tracks the errors. You could set the Sentry DSN for both of them.
Never heard of it, but it seems really promising. I’ll definitely check that out!
All major providers should be supported, 100% agreed. But I don’t have experience with most of them, so I think it’s better that the community should add them when they’re working with them. They’ve more knowledge about the Cloud-providers and can probably configure the Helm chart better than me!

I’m happy to keep in contact and improve the K8s support. I’ve created it and open-sourced it for this reason. I’m not sure if the Balena team will create official Kubernetes support, I only know they’re busy with a new major version of open-balena. But maybe someone from the Balena team can confirm if they’re working on Kubernetes support? If not, me and the rest of the community can probably create a stable version and implement it in the official open-balena repository!

For everyone that’d like to help and/or share thoughts about how to improve it, please feel free to react or contact me so that we can chat!

nico · June 28, 2020, 4:37pm

@bversluijs it would be interesting to have a chat about everything you’ve done so far and discuss how/what you’re using openBalena for. Please PM your Skype/Messaging details so that we can chat more as I definitely want to contribute.

Certificates: I think certificates can be moved to an extra instruction/argument on installation because in some cases people might be using alternative cert provider, using existing certificates or manually creating certificates for whatever reason.
Namespace: Awesome, let me know what I can do to help?
Monitoring: This is optional yes, could be added as additional instruction/argument on installation. I use Prometheus + Grafana in all my stacks so I should be able to do a write up, generally I install this cluster wide for auto-discovery of resources and then it’s just a matter of setting up dashboards. Kubernetes Monitoring Stack with Prometheus, Grafana and Alertmanager
Logging: Agreed, this is optional same as above. Sentry is a paid for service whereas Elasticsearch can easily be run on the same k8s cluster and can be integrated into existing cluster eco-system to collect logs for all pods and services - Helm chart | Set Up an Elasticsearch, Fluentd and Kibana (EFK) Logging Stack on Kubernetes
DNS: Been testing it on a few projects and has been working great, this can also be added as optional instruction/argument on installation.
Cloud: I can assist with several providers as I have a few running clusters.

Like I said I would gladly contribute and I think that this can be merged upstream to the official open-balena repository.

bversluijs · July 8, 2020, 2:24pm

Update for everyone that’s watching this topic or using the k8s chart of openBalena, because I’ve made some changes.

The DB, Redis, Registry and S3 Minio are all using PersistenVolumes. Because of this, they’re probably Stateful containers. So I’ve changed them to StatefulSets. For everyone who doesn’t know the difference:

Deployment is a resource to deploy a stateless application, if using a PVC, all replicas will be using the same Volume and none of it will have its own state. StatefulSets is used for Stateful applications, each replica of the pod will have its own state, and will be using its own Volume.

This seems a better fit for those containers, with my knowledge so far.

For everyone that uses k8s for openBalena at the moment
Because the StatefulSets are used for those containers, you’ll have to migrate your data. So make a backup first. The StatefulSets will make new PersistentVolumes and the old PersistentVolumes will probably get deleted

I’m using a S3 of a cloud provider, so I only had to migrate the database. I’ve used this StackOverflow answer to make a backup of my Postgres database and insert it back into the StatefulSet database. When you’re doing this, change the replicas to 0 for the API, so no-one uses the Postgres database. Because else, you’ll get errors.

Some questions for the Balena team (out of curiosity):

Why does Redis need a persistent volume? Isn’t it stateless?
Why does the Registry need a persistent volume? Isn’t it stateless?
And by stateless I mean, what if the data gets lost for those containers, and a new container starts with a fresh volume, what happens then?
What containers need to be restarted when one of the containers stops? (So, for example. I know the API has to be restarted when the DB container is restarted)
Can the Postgres run in a cluster by default, or with some added options?

We’ve been using openBalena with k8s for a couple of weeks now, and all seems steady. We don’t have devices running in the field at this moment, because we’re waiting for the new update of openBalena (hopefully it comes soon enough, because else we don’t have an option and have to ship). But the Kubernetes chart is working nicely!

bea.steers · October 26, 2020, 8:10pm

Thanks for all of the work on the Helm Chart!

I have a couple of questions:

Are there any plans to fold the quickstart and install into the helm chart? It looks like those commands are still running on and modifying the host machine which unfortunately breaks when testing using minikube locally.
I’m not sure if this will be a problem yet (because I haven’t gotten it up and running yet), but I’m already using traefik for several of my services already. Would it be worthwhile to refactor the ingress sections so we can specify an externally defined load balancer? I don’t know if there’s a problem with running 3 load balancers (I’m new to k8s), so if that’s fine then I’m happy to drop this point. Alternatively, I may look into hopping everyone else onto the nginx proxy but I do like having the traefik dashboard so we’ll see

bversluijs · October 27, 2020, 4:19pm

Hi Bea,

I’ll try to answer your questions:

I’m not sure what you mean. It modifies the host machine in the sense of that it adds files to the host machine, correct? This is how openBalena works to save some config details. I’m not going to change this behaviour, because it has to stay compatible with the openBalena repository.
I’ve tried to make it work with 1 Ingress controller and 1 Load Balancer, but it just isn’t possible in the way openBalena and Ingress controllers work. The only way to make this work is to change the VPN port from 443 to something else, but this can cause firewall issues on some sites where you deploy devices. So for now, you need HAProxy and NGINX with 2 Load Balancers (1 for the VPN, 1 for the rest).

By the way, I’ve started this project to learn more about Kubernetes and I want to scale openBalena easily. But this is not officially supported by the Balena. Nevertheless, every idea is welcome

ortix · March 30, 2021, 11:45am

The quick start script appeared to be broken. I have submitted a PR to @bversluijs with fixes: Fixed quickstart script and merged upstream by ortix · Pull Request #1 · bartversluijs/open-balena · GitHub

bversluijs · March 30, 2021, 12:08pm

Thanks for the PR! I’m glad I’m not the only one using the K8s fork!
It’s been on my to-do list to update my current openBalena instance to the newer version (from v3.1.0 to v3.2.1), but haven’t had the time to do so.

When I find the time, I’ll update the repository if it needs any changes and post my findings here about the update!

bversluijs · April 11, 2021, 1:23pm

I’ve been busy with updating the K8s chart, however the new tunnel. endpoint for tunneling via TLS is very hard to do on Kubernetes. It requires some custom HAProxy configurations but the HAProxy Ingress doesn’t support that very well.

I’m not an expert in HAProxy, so it’s just trial and error. If a Balena team member can help me debugging this, it’d be very much appreciated.

I probably have to create my own HAProxy Ingress, because the default HAProxy ingresses don’t provide in all customisations open-balena has, like always using TCP first, and converting it to HTTP(S) or VPN-connection later. Mainly because the VPN & tunnel now use the 443 HTTPS port, but an Ingress only uses mode http and not mode tcp.

Almost there, I’m trying to bundle both the HTTPS services and the VPN into one Load Balancer, probably with v0.13 of the HAProxy Ingress controller this will work, however, this will become available later this week (probably).

And I’m trying to figure out how I can make the templates in such a way that the Helm chart only provides in things you’re likely to change and not the HAProxy controller configs. Don’t now if I can make it work, but I can at least try.

There’ll be some changes, like the VPN will expose the VPN port via hostPort for direct connection, but this means you can’t run multiple VPN pods on one node. But more on that later!

bversluijs · April 13, 2021, 8:38am

I’ve done it (sort of).

First, the good
I’ve created a custom template with a little snippet in it. This little snippet is the same as this snippet, which works as expected. Also, the TLS tunnel on tunnel.<domain> works. And, even better, everything works using 1 HAProxy Ingress controller and thus using 1 Load Balancer .

Now, the bad
This little snippet is made kind-of hacky. It’s simply a hard-coded snippet, which only works if you install openBalena on the default namespace in Kubernetes. Also, this snippet overrides the HAProxy Ingress template, which is not recommended, because of future updates.

This is, in my opinion, not the way you want this k8s support to work. With the environment variable NAMESPACE when using the k8s commands, you can define your namespace, but the VPN will not work if it’s something else than default right now. This is the reason why I won’t push my changes.

But there’s more good news
I’ve talked to the maintainer of the HAProxy Ingress (big shoutout to him!) and he’s busy with releasing v0.13, which adds the ability to add such a snippet to the configuration without a template overwrite. This means I can change the snippert according to the namespace, which fixes the above problem. It’ll probably be released this week, so I’ll definitely implement that once it’s released.

Now, the configuration options. I’ve learned a lot about Kubernetes, Ingress and Helm while developing K8s support for openBalena. I’m fairly confident that the current setup I’ve created is more future-proof and versatile. I’m only struggeling with some bits here and there (like configuring things in the Helm chart of HAProxy which I can update instead of manually) and some variable naming and labels I’ve to define. In the first version I was just glad I got it working, but now I’m more interested in building a better configuration, which is more future-proof.

Next to that, I haven’t tested the direct connection to a VPN node just yet. This is needed for direct communication using the Balena SDK. I’ll test that shortly after implementing the HAProxy Ingress v0.13. And, I’ve to admit, I haven’t tested scalability just yet. I’m sure the API will scale fine, fairly sure about the VPN, but I just don’t know about the registry, S3 and database. I don’t have the knowledge (yet) in how these things scale. I think it’s better in these cases to just use 1 pod, but scale it vertically.

At last, I’d like to ask the community and Balena team what changes you’d like to see?
And for the Balena team, can we come up with a solution to make the Kubernetes support somewhat official or something?

Update
Almost there, I just have to wait for a more stable release of Haproxy Ingress v0.13, where bugs are fixed when using ssl-passthrough and ACME challenges.

I’m also having some challenges with the Helm Chart, posted a question on StackOverflow about it, so hoping for a solution for this.

When this is finished, I’ll test the upgrade from the previous version to the current version, The Helm Chart is also changed and can be updated from the repository’s side instead of manually changing things in your config from now on.

Update #2 Please read if you’re a openBalena K8s user
I’m busy with creating a better Helm chart, with more options (like setting annotations, nodeSelectors, affinity, resources etc) per resource. Also, I’m using the knowledge I’ve gathered throughout by Kubernetes / Helm experience, so the templates are better. I got some of my inspiration from a fork of Abdelq, however, I couldn’t find him on the forum, but thanks for that!

This was my first Helm project and “big” Kubernetes project, so the first one was more like a PoC than a good chart. I’m currently working on the next chart “release”, which is more customisable.

Long story short, while this is a big improvement for Kubernetes users, the first settings.yaml isn’t compatible anymore. However, this does not mean you can’t upgrade, It’s just some manual copy-paste work. I don’t know who and how many users use the K8s variant of openBalena, but feel free to contact me for support on how to do this! Obviously I’ll try to prevent this in the future, but I don’t know the changes openBalena will make and the first chart just simply wasn’t Helm-worthy.

P.S.
I’m also waiting for a PR to be approved in the Haproxy Ingress chart and a stable release of v0.13 of Haproxy Ingress. So fingers crossed.

Langhalsdino · May 18, 2021, 11:31am

@bversluijs I just wanted to thank your for your great work.

I have been a silent user of your work, but did not have the time yet to publish my changes (similar to @ortix ) PR, since it’s still work in progress. I will upgrade to your proposed method and highly appreciate your work

Another side note, i am currently using GKE, therefore this seems to be working on GKE and Digital Ocean.

(My changes have just been to the dependency and rerouting the tunnel.balena url with ha proxy)

bversluijs · May 18, 2021, 1:56pm

Hi Frederic,

Thanks for your kind words!
About the tunnel, I’ve been really busy with implementing this, and everything is done but I’m waiting for a new release of the HAProxy controller.
This is available in branch K8s-support-v3.2.1.

However, I’m interested in how you’ve fixed this. So please share your solution with me, so I can implement this if necessary

By the way, glad to hear you’re using it and it works on GKE!

I’m happy to chat about this implementation via a channel by the way!

Langhalsdino · June 7, 2021, 10:05am

Honestly i did not fix the tunnel bug. I am using proxy tunnel inside the kubernetes environment in combination with https://www.bastillion.io/

The Forum entry describes how to set up proxytunnel:

Since currently bee season is keeping me busy i am not as responsive as if i would like to be. I would be happy to support you improve the kubernetes support and we can set up a hangout and just chat about about our current implementation without the risk of exposing secrets (api keys, entrypoints, …) publicly GitHub has my public email address.

bversluijs · June 28, 2021, 2:46pm

Not sure who’s using the Kubernetes Helm Chart at the moment, but I’d like to share this news.

I’ve released a new Kubernetes Helm Chart for openBalena!
I changed the whole Helm Chart, made it more readable and added some new features.
But better yet, it’s now up to date with v3.4.0 of openBalena and the newest Balena CLI can finally be used again.

I’ve tested it more thoroughly, like adding more VPN and API services for scalability and the Helm Chart is more customisable. So if you’re using the current Kubernetes Helm Chart for openBalena, I’d advise to try and upgrade it to this version.

The only big issue is that the old version isn’t compatible with the new version, as mentioned before. So you’ve to do some manual work to get things working.

I’ve explained most of it in my PR, but please feel free to ask some questions. And let me know what you think of it and if you’re using it!

PR to make this official: Kubernetes support via Helm chart by bartversluijs · Pull Request #124 · balena-io/open-balena (github.com)

Topic		Replies	Views
Removing systemd openBalena	0	76	June 11, 2024
OpenBalena on Kubernets won't stay balanced - OpenBalena 3.7.0 openBalena support , raspberrypi4	2	43	July 29, 2024
Kubernetes with balenaEngine as the Container Runtime balenaEngine	3	575	August 3, 2021
Upgrading from v2.x.x to v3.x.x openBalena	26	2206	February 15, 2021
Balena Container OS openBalena	22	1324	November 28, 2019

Kubernetes for scaling

Related topics