Well we’re hitting a few pretty nasty issues that seem to happen at power up. Once the device is powered up and stable, it tends to run nicely. Getting to that point can be tricky. Honestly the supervisor restarting Nemo at boot, i.e. “network changes” issue, is not quite as critical at the moment.
My biggest concern is this ticket: Chronyc config is bad if device comes online without internet, which has expanded since we first wrote it up and now actually discusses at least two issues, maybe 3. Basically we’ve experienced multiple problems on boot where nemo can’t talk to the supervisor and can’t seem to reach the internet, and we can’t seem to get a shell into the container and instead get a fork error from docker. In these cases, nemo doesn’t get restarted like this network changes error, either intentionally or accidentally. The result is that the software running on nemo just spins. Nemo is the UI back end, so the customers see this directly as “device is busted” and have to power cycle once, or sometimes more than once, to get it to recover.
I think at least one of the causes has to do with powering the device down for multiple days or longer and then turning it on in an environment where internet isn’t available (which is typical for some customers). Our devices don’t have an RTC, so when they come up their date is old until chrony gets an NTP sync. Either that, or the time sync itself, or who knows what seems to cause all kinds of havoc.