Error -71 that is fixed only after purge

Hello,

So, we have several systems of ours that are running in production using BalenaOS 2.98.33. It is an intel x86 architecture, 11th Gen Intel(R) Core™ i3-1115G4 @ 3.00GHz, on a embedded touch panel PC.
We have 2 cameras connected that are running in our system, by usb. The cameras consume 2.1 W, each. This system is restarted from time to time(it depends on the user). Sometimes, this system can run for days, weeks and months without issues.

Now, it originates an issue, randomly(there is no pattern), with the usb and it restarts the camera non stop. We may have thought that it was an usb hardware issue, but, we decided to purge the services and install them again. After doing that, the system now works correctly without any problems (using exactly the same hardware setup and services version). Before the purge, even if you power restart it, it still gave the same problem of restarting non stop (which also originates an increase in temperature of the panel pc).

I attach the dmesg logs to show the error -71 that it originates.
dmesg.txt (232.3 KB)

Another important piece of information is that we also have an issue where sometimes the audio just stops working and a purge solves that.

Can you provide any guide on:

  • How could we replicate this issue quicker? (this is the most important step, since, without this, we cannot confirm any solution…)
  • What is causing the issue itself?
  • How to solve this?

The issue is leaving us quite stunned and we have no idea how to debug this.

I hope to hear from you soon.

Best regards,

Hello @alexandrepires5 thanks for the detailed message!

could you please confirm the device type that you are using? is it possible to access to the devices granting support access?

I contacted the OS team internally to see how we can help you more!

Hello @mpous ,

The device type is generic x86-64 mbr. Regarding the access, it is not possible, since it is based on openbalena and our own infraestructure :frowning:

We are trying to find a way to replicate.

Best regard,

@alexandrepires5 this looks an issue with the USB. Did you try with other balenaOS versions? I know that you can’t do hostOS update on openBalena, but did you try? or just with a different device type (e.g. generic x86 gpt).

Hello @mpous. We have another system, which has a slightly different hardware but uses the exact same cabling, cameras, software and OS and this does not happen.
We can try to update the OS directly to the newer version and see if it happens to fix it. But, we have the repeatability issue, so we wouldn’t know until 3 months at least if it happened to fix it.
Regarding the generic x86 gpt, the same issue would apply.

But still, it is weird that after a purge, it works again without hiccups (until it happens again after random time). Perhaps could be something regarding some cached or permanent state / memory that the OS has that can be causing the issue?

Hello,

We managed to find the issue and there was exactly one of ours file that was being corrupted / deleted from the filesystem. This happens after restarting the container with a very specific timing. Sorry to disturb with this issue! It can be considered closed.

Best regards,

1 Like

Thanks for sharing the solution @alexandrepires5

Feel free to share more!