How to troubleshoot reboots?

My custom board sometimes reboots. On bench I can see a stack dump in console but how do I troubleshoot less common reboots on devices in production mode on the field?

The board is a stm32mp1 based board and the problem can be in the M4 code or in A7 kernel/os/code.

Can an upgrade of balenaos help or an upgrade of kernal by upgrading yocto version from thud to dunfell?

Hello Ankan,
Can you share some more details on what balena image you use on your custom board? The list of officially supported devices lives here: Single-board computers - Balena Documentation Also, if you are looking to resolve a specific kind of issue, then sharing details/logs about the same could help folks help you.

There is no easy way to debug random reboots in production. Enabling persistent logging will allow you to go through logs after the device comes up. Whether those logs allow you to debug your specific problem will depend on what exactly the problem is. I don’t think that simply upgrading balenaOS will automatically address your concern.

Hope that helps.

Thanks.

I have compiled my own image with Yocto thud as there is no official support for my cpu.
I have enabled persistent logging, but can’t see anything in the log.
If the problem is in Linux I guess it should be shown in journal log, even if it is an watchdog timeout so maybe the problem lay in the M4 code.

What exact device is this @Ankan, including model number? Is it an Avenger96, STM Discovery, Olimex, etc?

@dtischler it’s a QSMP-1570 based board.

Ok, that is not one I am familiar with…The best advice I can think of is to enable persistent logging so that you can go back and review the logs after a reboot (otherwise they are lost, as balenaOS default logging is to RAM), hook up and leave a serial UART console running so you can see what the kernel error is, and perhaps double check to see if the vendor provides a newer kernel version, in case there were in fact issues with their vendor branch.