We are evaluating an intel based board with a TPM 2.0 embedded on the board. So to me that sounds like a good fit. The Yubikey based FDE also sounds interesting but that would probably require us to mount an encrypted partition in a container our selves. The drawback of that solution still is that it does not really protect against basic physical threads since you only need access to the Yubikey itself.
@cees.koolen, mTLS is the approach used for communication between the key server and the device. Although namely it would place the security in to the hands of the user who manages the network. I.e. there would be ways around it if you had no WiFi password on your network or anyone could access an ethernet port and jump on your network (steal a device, extract the mTLS certificate, connect back to the network the device was on, request the key, go away and decrypt the device). If hosting from an online service, the server could be configured to only accept requests from IP addresses of your network.
Certainly not an alternative to the ongoing work on full disk encryption, but may provide an avenue to place a level of security of content in to the hands of users. It will however, only be as secure as the weakest link.
I will be sure to drop a message here when it comes to fruition.
@maggie0002 thanks for the further explanation. I’m happy to carry that thought train a bit further as well.
So there is an mTLS connection between the “vault” or “secrets server” and the device that needs those secrets. But that pushes the problem to where do we store the mTLS client key would it not?
It does push the problem to the key storage, or more specifically it places decryption dependent on to two places, the mTLS key on the client, and the server serving the encryption key with the parallel mTLS key. So someone would need access to both the mTLS key on the device and access to the server serving the key. Which is where network security, or security of the online server comes in to play.
A few scenarios. (1) Bring the ‘secrets server’ on to the network while the devices boot and remove it again (a sort of boot key). (2) Host the secrets server on an online VPS, and restrict access to that server based on the IP address of the network the device is connecting from. (3) Ensure security of the network the devices are run on, for example make sure nobody has your WiFi password, then security of the content on each device all lay in one key server to be secured.
It’s hard to elaborate specifics without knowing specific scenarios, and certainly not without gaps.
I think I’ll simmer a bit on the topic to see if we could make it fit. It also depends a bit on how long it will take the OS team to finish the awesome work they are doing.
@cees.koolen, while I am still not sure this is going to be right for your particular use case (your hardware potentially permits other options) here is a go at some of the ideas we had discussed: GitHub - maggie0002/secure-store
All the usual caveats, it is an experimental project just to see what potential there may be.
@maggie0002 thank you for pointing us in the direction of that experiment and keeping this thread alive.
What I was thinking about is that we might be able to use the Balena VPN as the secure environment for the secrets store. If we can somehow validate the identity of the requester through the Balena API / VPN we can ensure that we only deliver the keys to devices that are in a particular fleet.
For example if we use the Balena API to forward a port to the device that requested the secrets and send the secrets through that forwarded port, we can ensure that only devices that are members of the fleet will be able to receive the data.
Obviously that still means that we need to trust the storage of the VPN keys on the device but it will add a layer of trust by verifying the identity of the requester without the need of storing mTLS details on the device.
@cees.koolen, I’m eager to keep brainstorming it, this is helpful. I have read your post a few times though and not sure I understand fully the idea. Would it be:
Device contains no env variables → device successfully communicates with Balena through the VPN → because of the successful communication the environment variables are now available.
Which then moves the point of security to securing the balena API keys?
@maggie0002 My idea is that when the Secure Store Client
requests a secret from the Secure Store Server
that the Secure Store Server
then verifies the ID of the client through the Balena API / VPN.
So for the Server to be able to do that, it indeed needs some API keys that need to be kept safe there… but since the server already contains all the secrets, adding these there might not be that big an issue.
@cees.koolen I think I see now, this is very interesting!
So Secure store client passes its details to secure store server → secure store server verifies the details for the client with the Cloud → if valid it passes the unlock key to the client and the client decrypts.
What could be really good about that is it would mean if someone was to remove a client device from the Cloud (such as one that is lost of stolen) then that device would no longer be able to decrypt the content under any circumstances. It would be a way to deprovision a device.
I think it would be a blocker for offline devices though, so perhaps would best be optional? I do like the idea of having less steps by not needing the MTLS keys, but the MTLS keys also provide the secure offline option (by offline I mean on a network, but without internet access) and secures the traffic in transit between the devices. The latter may be overcome by simple TLS, but then we wouldn’t want to verify a certificate against an external key server, both for hassle of managing it, but also for offline mode, and overriding that is something that just kinda feels clunky.
I did look at one point of trying to put the MTLS keys in the Cloud as environment variables. It may reduce the friction of the setup a little. Technically, if the Secure Store Server has a more permissive API key for the Cloud then perhaps the server could generate the MTLS keys and add them as environment variables for the entire fleet automatically (that is assuming we could store it in a single line environment variable and then extract it again in the right format). My concern with this is on first add it would restart the containers on all the devices, even those without any of the secure client or server, the entire fleet (adding env variables to devices in the Cloud restarts the containers on attached devices). I’m not sure how big a deal that is.
There is lots of thinking out loud here, I will keep mulling it over. I think your idea is really good. If you have others on the above, would be great to hear about it. Thinking through whether to MTLS certificate or not to MTLS certificate is the question (or one of the big ones); user hassle of the setup vs security vs what happens if someone wants to replace the certificates.
@maggie0002 I agree that the scenario for offline systems is really different.
Also the idea of provisioning the mTLS certificates through the API falls apart in that scenario since changing the environment variables on offline devices will not do anything until the device connects to the Balena servers again. For systems that consist of multiple services it could be done by just setting the variable for the specific service that requires it. That would at least make the action less intrusive. For the time being, I’ve used this method of setting mTLS certificates on one of our services since that gave us the opportunity to continue development of the application without depending too much on the final solution for the key management. We used the base64 encoding method such that we could set the keys as a single line of code in the environment flags.
As you wrote in your reply, I would indeed still use regular TLS to connect to the Secure Store Server and giving the clients a certificate to verify its authenticity is rather simple but really important.
I still think that for the offline scenario having the Full Disk Encryption with Secure Boot would be a life saver.
The idea behind provisioning them through the API wasn’t for the offline devices, just in terms of it being an easier setup. I assumed the reason you had said without the need of storing mTLS details
was because it is a bit of hassle to manage?
Setting it for the specific service seems like a nice idea, that way it would only trigger restarts on devices that have the secure store client. Which seems like a restart would be necessary anyway, otherwise why are they running the client.
I still think that for the offline scenario having the Full Disk Encryption with Secure Boot would be a life saver.
Absolutely, for offline and online Full Disk and Secure Boot would be far better, and certainly none of this detracts from that work. Purely an exercise for users without TPM.
Deploying the mTLS certificates through the Balena API effectively also is proving the identity of the clients through the Balena VPN.
Except I think then the MTLS certificates would remain on the device, and would keep them even if the device was removed from the fleet at the Cloud level (at least until it connected to the Cloud and then uninstalled the containers and cleaned up the images).
By having the MTLS as environment variables, and then the server verify the client details with the Cloud too, then we could deprovision a device, whereas if the MTLS keys were compromised there would be no way to deprovision without rolling out new MTLS keys. Perhaps rolling out new MTLS keys isn’t such a big deal, as long as the container restart isn’t an issue. I wonder if it may be better to have the provisioning and deprovisioning at a client level rather than the whole fleet though, then by simply deleting a device from the Cloud, it is also deprovisioned from the secure store server without any extra steps. Downsides, having the server do the verification means adding a very permissive API key to the server, which makes the server more of a risk. Plus doing client → server → cloud verification is more work to implement than just using Cloud store MTLS keys for which the functionality is basically already there.
I was thinking that in stead of mTLS it is also possible to “just” store a JWT token in the environment of the client. With the proper fields in there, it is also possible to revoke the token from the server side without the need to erase it from the device as well. We could still push the JWT token from the server to the Balena Cloud using the API to make it dynamic and do auto-rotation of the keys (depending on how often the devices are expected to check-in with the server) but we would not need the mess of storing a mTLS key-pair in the device environment using base64 encoding etc.
Perfect timing, I was just sitting here thinking through some options.
I am certainly onboard with adopting one of these methods we are discussing, I think your idea on having the MTLS keys in the env variables has been the one that seems to keep surfacing to the top the most.
in stead of mTLS it is also possible to “just” store a JWT token in the environment of the client
This is another one I will have to throw in the pot. First thoughts are that I’m not sure if we would want to use it as a substitute for MTLS/TLS, I think a JWT token exchange would be sending the key through an insecure communication channel (like sending your password over http rather than https). MTLS secures the traffic in transit. Unless the proposal is to move everything to the Cloud, but then we have the vulnerability of the Cloud API keys on the client, which although would allow revoking wouldn’t have the ability for the key to be withdrawn by default.
So many moving parts, I will definitely be pondering the JWT token idea though. My unfinished thought when I read your post was whether authentication could be done via a third party platform like Auth0 (which I’m not particularly familiar with but was the first one that sprung to mind). Even if the JWT approach had to be done with MTLS too, at least we could make the MTLS process invisible by having it auto configure, and only expose the JWT key management options to the user which would be far less friction.
It’s quite complex visualising all the moving parts, I’m thinking we may need a diagram because I am staring at the words but the moving parts are not falling in to place.
What I was thinking about is that we might be able to use the Balena VPN as the secure environment for the secrets store. If we can somehow validate the identity of the requester through the Balena API / VPN we can ensure that we only deliver the keys to devices that are in a particular fleet.
@amrishhpuri, I think you are right, it is similar to what @cees.koolen mentioned above. It seems the most likely scenario right now is to allow the MTLS keys to be in the environment variables in the Cloud then only devices that have access to the Cloud will be able to decrypt.
That is basically there already, it would just need to convert the key from base64, prioritise keys over files, and update the docs: secure-store/mtls.go at 3558f947fe3bd3416d3ebaec2a94aefd53688f6b · maggie0002/secure-store · GitHub
We could put the decryption password on the Cloud too, rather than on another piece of hardware, but it wouldn’t provide as good security because we wouldn’t be able to remove the Cloud from the equation. For example, right now the server is kept on separate hardware which you can protect or remove from your network, if we move all that functionality to the Cloud then there is no way to stop exposing it. Effectively it would mean someone wouldn’t be able to read the SD card and copy the content off it because it would be encrypted, but instead they could copy the Balena Cloud API keys from it, use the key to query the Cloud for the password, then decrypt the content with the password.
There are other options too, I thought about notifications on a dashboard or your phone requesting whether a device can be decrypted, and if you click yes then it passes the decryption key.
@cees.koolen, someone has also queried whether it would be worth using the TPM for Secure Store. We tend to think about TPM for full disk encryption, secure boot etc, and that is being worked on, but you can also use TPM to generate keys for other things. It could be used from inside a container to generate the key that decrypts the Secure Store content, instead of using the environment variables in the Cloud. It has some promise, but without secure boot, someone could potentially put their own software on the hardware, extract the key from TPM, then decrypt the content with that key.
A quick update. A few things in the PR here to be merged (Updates by maggie0002 · Pull Request #1 · balena-labs-research/secure-store · GitHub), any thoughts or testing welcomed and appreciated:
- Allow adding the MTLS certificates as environment variables instead of files and prioritise environment variables over files. This creates a dependency on access to the Balena Cloud for the decryption to function.
- Allow creation of the encrypted mount locally by passing in a password. In other words, running without the need for the server, but the user will need to implement their own steps to protect the password stored on the device. This is helpful for those who want to fork the project, and may want to embed alternative security measures in to the binary.
I think this covers the low hanging fruit from above, and will keep exploring other options, happy to hear any thoughts.
For the benefit of the encrypted mount we need to look at the idea that it can hold many secrets and we only need to unlock it once where as we can also use that same password to encrypt all the individual secrets but then we need the password much longer.