Every morning my devices reboot in order to reestablish their cellular connection.
(this may be an issue in itself. Perhaps I should try to solve the problem with the cell connection rather than rebooting, but I’ll still need to occasionally reboot when all other cell connection restoration steps have been tried, so this thread is still relevant.)
The issue began when I upgraded from balenaOS 2.31.5+rev1 to balenaOS 2.45.1+rev1. I’m working on getting my application to work on higher OS versions so downgrading is not a great option.
Problem
When the device reboots, the first container that it starts doesn’t have access to the /dev/video0 device. I thought this was because it didn’t have enough time, but even after adding an infinite wait loop that merely checks if /dev/video0 is available it still hangs indefinitely.
Restarting the container solves the problem. In other words, this is only an issue for the first container that’s started right after reboot.
I have granted support access to it. You should expect to see:
Device /dev/video0 not found
Device /dev/video0 not found
Device /dev/video0 not found
...
Device /dev/video0 not found
Device /dev/video0 not found
Non-solutions
A potential solution is:
if "/dev/video0" is connected:
run startup script
else:
kill and restart container
I don’t want to do this because:
It’s not the right way to solve the problem—this isn’t happening on older OSs and I shouldn’t need to restart a container to get it to connect to a device that’s already there.
I am trying to leave my docker-compose.yml with setting restart: "no" so that I can get emails when devices aren’t working and ssh into the container to debug it with its (broken) state maintained.
I’ve just restarted the device to see if I can replicate it. From what I can tell the container is started before the USB bus has finished enumerating the devices. After reboot I was able to shell in and watch it, and the video devices aren’t present until at least 30 seconds after the container has started.
I’ll check with our balenaOS team and see if I can get any insight into what’s happening.
I’ve had a response from the team and the suggestion is rather than delaying your service from starting before the USB device is available, to start the service with UDEV enabled so that you can detect the camera being “connected”. Here’s a thread discussing doing a very similar thing: Docker container cannot access dynamically plugged USB devices
In the meantime, can you give me the highlevel explanation on why this happens? Why is it that the OS doesn’t wait for devices to be loaded before starting the first container, but then by the time a second container comes up it has no trouble passing in the device?
For one thing as James mentioned above, the device shows up very lately and USB devices could show up at any time as they can be plugged in dynamically. I do not know if it makes sense or is even possible to wait for USB device enumeration to be complete before starting containers.
Looking at a privileged container in ‘normal’ (non balena ) docker I can see that changes to the device file system (like plugging in a new USB disk) generally do not update the /dev folder in a running container. I guess you need UDEV running in the container for that to happen.
@cnr have you tried running the container in privileged mode and adding UDEV=1 ? That would be the first step to see if video0 gets mounted while the container is running.
Yes, setting UDEV=1 only works for balenalib-based containers because of their configuration. Is there a reason you are unable to use a balenalib base image? You’re right that it is possible to replicate the udev behaviour of our base images.
Yeah I had to avoid the balenalib base images for my build because I needed to support some Nvidia specific stuff. I’ve now had the image for a while and upgrading to a balenalib base image won’t be the solution I employ for this fix.
Could you please share any steps and resources you have for replicating the udev behavior of the balenalib base images?