We’ve had 3 remote production devices get into a strange state. We first noticed this state since our application containers appeared to have stopped running.
This strange state is characterized by: application containers are stopped and device shows “Online” in balena site. Device seems to accept a “Restart” request, but then goes into an infinite loop of trying to start the container and failing, giving the message in the balena logs:
22.01.19 09:04:20 (-0800) Failed to start service ‘main sha256:d190a908c4728f103821c160c827ceed4a6e94be932abd2fcad7ac4972f5f928’ due to '(HTTP code 500) server error - error while creating mount source path ‘/tmp/balena-supervisor/services/485241/main’: mkdir /tmp/balena-supervisor: file exists ’
Issuing a “Reboot” request restores the devices to a fully operational status.
Why is this happening, and how can we prevent it? What can we look into?