Is there a mechanism for a balena device to automatically recover a corrupt application partition (for example, reverting it to the last flash state)?
I read this post which says that the only risky writes are user application writes. My application writes to a log but otherwise does not make writes that need to be preserved between boots. I don’t mind blowing away my logs as part of a corruption recovery process, but I’d prefer not to disable logging altogether. Is there a way to revert the data partition automatically on a corrupt boot, or could I modify balena to make the data partition read-only and write my logs out to a separate read-write partition?
If I’m trying to do something that is complex/foolish to do in balena, please let me know. I was initially looking at a barebones approach using buildroot (yocto’s complexity put me off) but balena already has a lot of the things I need (such as an A-B flashing strategy)
Hi Graham, balenaOS will already run fsck on boot on all partitions - what other mechanism to automatically recover a partition were you thinking about?
The data partition cannot be made read-only as it’s used as the engine’s storage. Also, balenaOS will happily boot even if the data partition is empty, so that could be a recovery mechanism is the engine state is corrupted for example.
Thanks for getting back to me. Data being read-write makes sense, I suspected that was the case. Good to know re. fsck.
If the Data partition became corrupted, could it be re-populated with Docker image(s) to restore it, or is Data the same place that the images are kept? I was envisioning a recovery system where the Data partition could be restored to its initial state on boot (if necessary). Our devices won’t be online, so we are looking for a recovery option that doesn’t require cloud or manual intervention.
Hi Graham, if the data partition becomes corrupted and cannot be accessed even after a fsck, the OS will attempt to recover by erasing it. When balenaOS boots with an empty data partition, a supervisor is re-downloaded, and the supervisor will re-download all container applications to match the cloud state.
Unfortunately the system is not designed to recover if there is no cloud connectivity.