So, we have several systems of ours that are running in production using BalenaOS 2.98.33. It is an intel x86 architecture, 11th Gen Intel(R) Core™ i3-1115G4 @ 3.00GHz, on a embedded touch panel PC.
We have 2 cameras connected that are running in our system, by usb. The cameras consume 2.1 W, each. This system is restarted from time to time(it depends on the user). Sometimes, this system can run for days, weeks and months without issues.
Now, it originates an issue, randomly(there is no pattern), with the usb and it restarts the camera non stop. We may have thought that it was an usb hardware issue, but, we decided to purge the services and install them again. After doing that, the system now works correctly without any problems (using exactly the same hardware setup and services version). Before the purge, even if you power restart it, it still gave the same problem of restarting non stop (which also originates an increase in temperature of the panel pc).
I attach the dmesg logs to show the error -71 that it originates. dmesg.txt (232.3 KB)
Another important piece of information is that we also have an issue where sometimes the audio just stops working and a purge solves that.
Can you provide any guide on:
How could we replicate this issue quicker? (this is the most important step, since, without this, we cannot confirm any solution…)
What is causing the issue itself?
How to solve this?
The issue is leaving us quite stunned and we have no idea how to debug this.
@alexandrepires5 this looks an issue with the USB. Did you try with other balenaOS versions? I know that you can’t do hostOS update on openBalena, but did you try? or just with a different device type (e.g. generic x86 gpt).
Hello @mpous. We have another system, which has a slightly different hardware but uses the exact same cabling, cameras, software and OS and this does not happen.
We can try to update the OS directly to the newer version and see if it happens to fix it. But, we have the repeatability issue, so we wouldn’t know until 3 months at least if it happened to fix it.
Regarding the generic x86 gpt, the same issue would apply.
But still, it is weird that after a purge, it works again without hiccups (until it happens again after random time). Perhaps could be something regarding some cached or permanent state / memory that the OS has that can be causing the issue?
We managed to find the issue and there was exactly one of ours file that was being corrupted / deleted from the filesystem. This happens after restarting the container with a very specific timing. Sorry to disturb with this issue! It can be considered closed.