First things first, my use case is definitely non-standard, so I’m open to the possibility I’m doing something wrong, as well as the fact that any support that can be given will be limited at best.
I’ve got a bunch of devices running a custom build of ResinOS on Raspberry Pi Compute Module 3s. I recently found that two separate devices stopped working due to a corrupted state partition. I suspect, but have not confirmed, that this corruption is being caused by power failure during a write operation. The affected devices were not corrupted at the same time or in the same circumstances, so some environmental causes can probably be ruled out.
The corruption made it so that the filesystem on the partition could not be mounted. This failure to mount causes many important things to break, such as the docker and dropbear ssh services.
Some other info: I mentioned that I’m using a custom build. It is based on Resin v2.3.0. The main difference that I believe may be relevant to this problem is that I’m using a 4.9.24 version of the kernel, as opposed to the 4.4 that resin uses by default.
My questions would be:
Has anyone else seen similar corruption of the state partition?
Is this to be expected if the devices are regularly shut down due to power loss, or should they be more resilient?
Could the corruption be caused by something other than unexpected shutdowns?
Are there any ways I could try to protect against this corruption (besides the obvious of shutting down properly, which I will try to do, but I still need to account for potential power loss)?
Thanks for your help!