Kubernetes for scaling

Could you use another ingress, purely for internal load balancing, and use a different domain to get around the fact you’d have two ingress controllers?

I.e make the second ingress a clusterIP?

Do you have an example on how to implement this? You can message me if you want or chat. I’m happy to try this out!

@richbayliss I’ve made some progress!
I’ve got the inter-container communication working. I had to add a hostname to the LoadBalancer (see here for DigitalOcean. Now I can communicate from pod to pod via https, which is great!

I’ve also added another LoadBalancer for the VPN with proxy protocol enabled, and the device connects to it, which is great!
Now I have a new problem, unfortunately. I was trying to SSH into the device via a tunnel (balena tunnel). First I saw some 400 bad requests to /health, so I checked that out. It seems like the HAProxy configuration doesn’t like PROXY requests, so that’s problem 1 (I’ve added accept-proxy after the port 80 bind, which seems to work). Only, the 3128 doesn’t like the PROXY headers at all.

I’ve tried to add a frontend for port 3128 in the HAProxy, but this doesn’t work because the connect-proxy already listens on this port (obviously). But, right now, it’s impossible to say: 443 uses the PROXY protocol, but the rest doesn’t. Because that’s not the way a LoadBalancer works. And adding another LoadBalancer will cause DNS problems.

Maybe adding an extra HAProxy Ingress instead of a plain LoadBalancer will fix this problem, it only adds some overhead because an extra HAProxy has to run. But I’ll look into this…

This works! I’m only missing port 80 of the VPN now, but it seems like everything is functioning just fine! Only problem is, it needs an extra Ingress with a custom configuration, so I’ll have to look into this if I can implement it in the basic k8s build…

Great news! I’m stoked that you managed to get this working, and I think I am right in saying that you’re the first person to get openBalena running in K8s :tada:

port 80 of the VPN

I dont think this is needed to have a working stack… certainly I cannot think of anything which will break as a direct result.

1 Like

Nice to know!
I’m creating a Helm script, which is almost finished, to get openBalena started in a heartbeat on K8s, and making it easy to configure (like replicas etc.).
I’ll post an update if it’s finished and pushed to the repo!

2 Likes

I think I’ve done it. Finally.

Disclaimer
It works on DigitalOcean, because they have a fix for issue #66607.
It probably works on AWS too, because they also have this fix. But haven’t tested it out.

I’ve created an Helm chart for open-balena, which creates all deployments, services and volumes. It also creates an NGINX Ingress for routing, as well as a HAProxy Ingress for the VPN routing. This means, it created 2 LoadBalancers. It’s necessary. Believe me.


I tried to create it that it’s easy to use if you know how to set-up the Docker open-balena.
You first have to execute the quickstart like you’d normally do. Use an existing email for the cert-manager.

$ ./scripts/quickstart -U <user-email> -P <password> -d <domain> 

Then, it creates the settings.yaml in config/k8s/settings.yaml as well as the normal Docker config (config/activate).

Then you’ve to start installing it on your cluster like so:

./scripts/k8s install

This will install the open-balena chart, as well as cert-manager. Which handles your Let’s Encrypt certificates, via HTTP-01 in combination with the Ingress controller. Your super-user username will be used for emailing you if your certificates are about to expire but it can’t renew them.

After this command is done, everything is up and running on your cluster. You’re not done yet!
You haven’t set the DNS settings yet. But because you have 2 LoadBalancers, the DNS settings are somewhat changed. If you’re using DigitalOcean, they LoadBalancers are named k8s-openbalena and k8s-openbalena-vpn (guess which one is for the VPN).

So link the k8s-openbalena to the following domain names:

<yourdomain>
api.<yourdomain>
s3.<yourdomain>
registry.<yourdomain>

Link the k8s-openbalena-vpn to the following domain:

vpn.<yourdomain>

You’re all set. After a few minutes, the cert-manager will have the certificates ready and applied to the NGINX Ingress controller.


Upgrading / applying other settings can be done like so:

./scripts/k8s upgrade

Last, but not least, take a look at config/k8s/settings.yaml.
All your settings are placed here. You can change the replicas per deployment, the storage size per PersistentVolumeClaim, set Sentry DNS’s, use an external S3 storage and check out your username / password if you forgot it.

I’ll be using this in the next couple of days, and notice that the k8s support is not official. Most likely there’ll be changes about the k8s support and settings. So keep that in mind.


Known issues

  • The API pod doesn’t run on the first boot. This is because it’s booted faster than the database pod and error’s with an CONNECTION_REFUSED. The pod doesn’t crash however and it doesn’t try to reconnect again. I don’t know why the pod doesn’t crash or reconnect, so I’m creating an issue for that in the open-balena-api repo #383 - Crash on fatal errors. So after everything has spun up, delete the API pod, because this will create a new one which connects to the database.
3 Likes

@bversluijs this is awesome work, well done and thanks for your efforts! You spared me hours (if not days) of work to get this running.

I was able to test this on Kubernetes running on Bare-metal (single-node CentOS 7 test ‘cluster’) and can confirm this is working. NOTE: Metal LB is used as LoadBalancer on Bare-metal.

I have a few comments/suggestions and would be more than happy to get involved to improve on this.

  1. When I build my k8s clusters I install cert-manager, nginx-ingress, cert-issuer, etc as this is a requirement for almost all services and I would like to re-use those existing services if possible. If I would run ./scripts/k8s uninstall it will remove cert-manager that is in use by other services on the same cluster.
  2. Namespace: I think this should be variablized as people might want to install openBalena into an existing namespace or even default if that’s all that will be running on the cluster, alternatively one might want to run multiple instances of openBalena on the same cluster in different namespaces (dev, stage, prod).
  3. Monitoring: Configure Prometheus + Grafana to monitor everything.
  4. Logging: Some form of centralised logging for example Elasticsearch.
  5. DNS (optional): Add External DNS to automatically create the records required by ingress-controller.
  6. Cloud (optional): Add support for all major providers (AWS, GKE, Azure, etc).

Hi @nico,

Thanks for your kind words! It took me days to get it working because of all sorts of complications, but I’m happy to hear that other people use it!

Answering some of your questions:

  1. You’re right. I’ve had this issue myself, but I’m trying to figure out what’s the best approach here. Add another command for cert-manager or just an extra instruction?
  2. Couldn’t agree more. I’ve used the namespace because I thought that’d combine all in one, but like you said, you can have running multiple instances on the same cluster. So I’m changing that!
  3. It’d be awesome to use Prometheus + Grafana to monitor everything. I’ve never worked with them so far, but I’m planning to. So feel free to publish on how to do it. However, it’s not necessary for open-balena to run. So I think having the option is nice, but not integrated in the open-balena Helm chart.
  4. Same as above. The option would be nice, but not integrated in the open-balena Helm chart. Also, Sentry is integrated in some of the Node.js containers of open-balena, which tracks the errors. You could set the Sentry DSN for both of them.
  5. Never heard of it, but it seems really promising. I’ll definitely check that out!
  6. All major providers should be supported, 100% agreed. But I don’t have experience with most of them, so I think it’s better that the community should add them when they’re working with them. They’ve more knowledge about the Cloud-providers and can probably configure the Helm chart better than me!

I’m happy to keep in contact and improve the K8s support. I’ve created it and open-sourced it for this reason. I’m not sure if the Balena team will create official Kubernetes support, I only know they’re busy with a new major version of open-balena. But maybe someone from the Balena team can confirm if they’re working on Kubernetes support? If not, me and the rest of the community can probably create a stable version and implement it in the official open-balena repository!

For everyone that’d like to help and/or share thoughts about how to improve it, please feel free to react or contact me so that we can chat!

1 Like

@bversluijs it would be interesting to have a chat about everything you’ve done so far and discuss how/what you’re using openBalena for. Please PM your Skype/Messaging details so that we can chat more as I definitely want to contribute.

  1. Certificates: I think certificates can be moved to an extra instruction/argument on installation because in some cases people might be using alternative cert provider, using existing certificates or manually creating certificates for whatever reason.
  2. Namespace: Awesome, let me know what I can do to help?
  3. Monitoring: This is optional yes, could be added as additional instruction/argument on installation. I use Prometheus + Grafana in all my stacks so I should be able to do a write up, generally I install this cluster wide for auto-discovery of resources and then it’s just a matter of setting up dashboards. Kubernetes Monitoring Stack with Prometheus, Grafana and Alertmanager
  4. Logging: Agreed, this is optional same as above. Sentry is a paid for service whereas Elasticsearch can easily be run on the same k8s cluster and can be integrated into existing cluster eco-system to collect logs for all pods and services - Helm chart | Set Up an Elasticsearch, Fluentd and Kibana (EFK) Logging Stack on Kubernetes
  5. DNS: Been testing it on a few projects and has been working great, this can also be added as optional instruction/argument on installation.
  6. Cloud: I can assist with several providers as I have a few running clusters.

Like I said I would gladly contribute and I think that this can be merged upstream to the official open-balena repository.

Update for everyone that’s watching this topic or using the k8s chart of openBalena, because I’ve made some changes.

The DB, Redis, Registry and S3 Minio are all using PersistenVolumes. Because of this, they’re probably Stateful containers. So I’ve changed them to StatefulSets. For everyone who doesn’t know the difference:

Deployment is a resource to deploy a stateless application, if using a PVC, all replicas will be using the same Volume and none of it will have its own state. StatefulSets is used for Stateful applications, each replica of the pod will have its own state, and will be using its own Volume.

This seems a better fit for those containers, with my knowledge so far.


For everyone that uses k8s for openBalena at the moment
Because the StatefulSets are used for those containers, you’ll have to migrate your data. So make a backup first. The StatefulSets will make new PersistentVolumes and the old PersistentVolumes will probably get deleted

I’m using a S3 of a cloud provider, so I only had to migrate the database. I’ve used this StackOverflow answer to make a backup of my Postgres database and insert it back into the StatefulSet database. When you’re doing this, change the replicas to 0 for the API, so no-one uses the Postgres database. Because else, you’ll get errors.


Some questions for the Balena team (out of curiosity):

  • Why does Redis need a persistent volume? Isn’t it stateless?

  • Why does the Registry need a persistent volume? Isn’t it stateless?
    And by stateless I mean, what if the data gets lost for those containers, and a new container starts with a fresh volume, what happens then?

  • What containers need to be restarted when one of the containers stops? (So, for example. I know the API has to be restarted when the DB container is restarted)

  • Can the Postgres run in a cluster by default, or with some added options?

We’ve been using openBalena with k8s for a couple of weeks now, and all seems steady. We don’t have devices running in the field at this moment, because we’re waiting for the new update of openBalena (hopefully it comes soon enough, because else we don’t have an option and have to ship). But the Kubernetes chart is working nicely!