Certificate based authentication for SSH

TL;DR: You can add an entry to config.json which will trust any key signed by your SSH certificate authority: Add a key like cert-authority $CA_PUBLIC_KEY to your config.json.

I thought I’d write up something we implemented earlier this week, making use of SSH’s support for certificate based authentication to allow SSHing to devices without having to place specific public keys on the device which would then need rotating as people join and leave the company.

If you don’t have any public keys in a device’s config.json the only way to authenticate is by the device sending your public key and username to Balena’s API to validate you’re allowed to connect. This is usually fine, and means that access can be revoked immediately if needed, but becomes problematic if the device has no connectivity to the Balena API. We had this happen recently where we could get to the device via an alternative route, but were unable to SSH in to fix things because auth failed.

We could pre-populate with one or more public keys attached to staff members, or a shared public key with the private key saved in 1Password to provide access to people, but then we’d need to rotate those keys on a regular basis to remain in compliance with security policies. Thankfully OpenSSH supports the used of signed keys, which allow a user’s public key to be cryptographically signed by a certificate authority, with devices permitting any key signed by that CA to authenticate. Signed keys can also have limitations attached around which users they’re valid for, expiry dates to be attached, and source IPs to be limited. You can find some background at OpenSSH/Cookbook/Certificate-based Authentication - Wikibooks, open books for an open world.

The missing piece for us was the need to configure Balena devices to accept our certificate authority’s signatures. Most of the tutorials out in the wild state that you need to update sshd_config for this, which would have required us forking balenaOS and managing our own images, however OpenSSH’s documentation mentions (buried deep in some options) that you can also do this via an authorized_keys file… the very same file that gets populated by adding keys to config.json.

The final step was getting that rolled out to our existing fleet of devices, which is very handily done using the script published at GitHub - balena-io-experimental/ssh-key-insert: Tooling to insert the relevant SSH keys into balena devices' configuration. to SSH into each device in turn, then add the CA’s public key to them.

2 Likes

Hi Jon,

This is really interesting, thanks for sharing.

Could you elaborate a bit on this section above? What certificate authority, were you able to modify and revoke those keys?

We’re currently running our own internal CA (which is an incredibly grand name to attach to a private key stored in 1Password, and a very short shell script to sign public keys with) - its effectively just following the instructions on the linked guide for certificate based auth. Everything is very manual which is generally fine given this is the last resort option for gaining access to devices in case of emergency. Day to day we’re still using key validation via the Balena API.

Key revocation isn’t possible using this setup as its not possible to push a key revocation list to devices via this method (and I’m not sure we’d want to via the path of rewriting config.json anyway), but we work around that by signing keys with a half an hour or so of validity. There is of course always a risk of key being compromised, but for our use case this is an acceptable balance when put along side the potential for remote devices being entirely inaccessible without sending out an engineer.

At some point in the future I’m likely to move from the current string and sticky tape solution to deploying something like Hashicorp Vault, which should allow us to exercise much tighter control of the CA key, and give us an audit log of when users have requested an SSH key be signed.

In an absolutely ideal world, this would all be supported as part of Balena Cloud, which I suspect could support running an SSH CA for each fleet of devices with balena ssh making a call to get the user’s public key signed.

1 Like

This is all really interesting, thanks for sharing. I am going to pass this around everyone on the team to have a read too. Could you let us know how things go with your move to Hashicorp Vault? It would be great to be able to follow this along.