Unexpected supervisor crash/container restart on network change

adamshapiro0 · March 29, 2021, 8:34pm

We’re still experiencing a number of host OS/supervisor-related issues that are affecting our customers. In particular, we’re seeing problems communicating with the supervisor if a device boots and the local network is available, but the internet is not yet available. In that case, our nemo service periodically either hits the OCI runtime exec failed issue described above where it appears to be running but can’t reach DNS and we can’t SSH into it, or DNS works but it gets 404 responses from the supervisor when trying to hit the restart-service API endpoint.

The 404 is definitely very strange - it’s clearly reaching the supervisor to get a response at all, and we know the REST query is good since the same exact query works normally. Why would it get a 404?

We’re still running a very old Balena release and supervisor on our devices (2.50.1). We want very much to update to the latest, particularly to solve system time issues when booting a device (Chronyc config is bad if device comes online without internet - #51 by alexgg), but right now we’re having issues with the UART dying on the latest Balena release (2.71.3) so we’re trying to get that resolved as soon as possible (UART Failure after upgrading to 2.71.3+rev1 from 2.50.1+rev1 on Variscite IMX8M - #22 by acostach).

Topic		Replies	Views
Stopped containers keep restarting Product support	10	1254	April 20, 2022
Supervisor is unable to start service Product support	34	1584	October 24, 2019
Balena Container Restarted Randomly Product support	11	441	July 29, 2019
OCI runtime exec failed: exec failed: container_linux.go:348 balenaEngine	3	7640	March 26, 2021
supervisor container created at unexpected times Product support	4	294	August 17, 2020

Unexpected supervisor crash/container restart on network change

Related topics