Host-OS reboot after 243 sec on development image

Getting this dump (see below) after 243 sec on my STM32MP card running Balena OS 2.58.0 development image. When I’am running in production mode I don’t get this reboot.

The reboot problem is solved by running follow command after boot before 243 sec have past:

echo 0 > /proc/sys/kernel/hung_task_timeout_secs

… but I like to understand whats wrong and not only hide it. Or if I just have to live it, where should I add that command to run automatic on boot?

    [  243.679640] INFO: task kworker/0:2:71 blocked for more than 120 seconds.
    [  243.684885]       Tainted: G           O      4.19.49 #1
    [  243.690319] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [  243.697999] kworker/0:2     D    0    71      2 0x00000000
    [  243.703871] Workqueue: events_freezable mmc_rescan
    [  243.708570] [<c08915b8>] (__schedule) from [<c0891a8c>] (schedule+0x54/0xc0)
    [  243.715791] [<c0891a8c>] (schedule) from [<c0895474>] (schedule_timeout+0x1c8/0x280)
    [  243.723188] [<c0895474>] (schedule_timeout) from [<c0892748>] (wait_for_common+0x144/0x16c)
    [  243.731456] [<c0892748>] (wait_for_common) from [<c063c57c>] (mmc_wait_for_req_done+0x8c/0x108)
    [  243.740115] [<c063c57c>] (mmc_wait_for_req_done) from [<c063cfb8>] (mmc_wait_for_cmd+0x6c/0xa0)
    [  243.748746] [<c063cfb8>] (mmc_wait_for_cmd) from [<c0647e2c>] (mmc_io_rw_direct_host+0x94/0x128)
    [  243.757586] [<c0647e2c>] (mmc_io_rw_direct_host) from [<c06482f4>] (sdio_reset+0x38/0x84)
    [  243.765724] [<c06482f4>] (sdio_reset) from [<c063ec60>] (mmc_rescan+0x3a4/0x3e8)
    [  243.773217] [<c063ec60>] (mmc_rescan) from [<c0133f2c>] (process_one_work+0x1f0/0x400)
    [  243.781271] [<c0133f2c>] (process_one_work) from [<c0134d4c>] (worker_thread+0x44/0x580)
    [  243.789259] [<c0134d4c>] (worker_thread) from [<c0139e78>] (kthread+0x148/0x150)
    [  243.796691] [<c0139e78>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
    [  243.803838] Exception stack(0xd1fddfb0 to 0xd1fddff8)
    [  243.808783] dfa0:                                     00000000 00000000 00000000 00000000
    [  243.817299] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    [  243.825117] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
    [  243.831882] NMI backtrace for cpu 0
    [  243.835139] CPU: 0 PID: 25 Comm: khungtaskd Tainted: G           O      4.19.49 #1
    [  243.842642] Hardware name: STM32 (Device Tree Support)
    [  243.847793] [<c010f45c>] (unwind_backtrace) from [<c010bb7c>] (show_stack+0x10/0x14)
    [  243.855514] [<c010bb7c>] (show_stack) from [<c087cfac>] (dump_stack+0xb4/0xc8)
    [  243.862725] [<c087cfac>] (dump_stack) from [<c0883278>] (nmi_cpu_backtrace+0x90/0xc4)
    [  243.870543] [<c0883278>] (nmi_cpu_backtrace) from [<c0883410>] (nmi_trigger_cpumask_backtrace+0x164/0x1a4)
    [  243.880185] [<c0883410>] (nmi_trigger_cpumask_backtrace) from [<c01b1dfc>] (watchdog+0x2d4/0x3d4)
    [  243.889047] [<c01b1dfc>] (watchdog) from [<c0139e78>] (kthread+0x148/0x150)
    [  243.895994] [<c0139e78>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
    [  243.903195] Exception stack(0xd2203fb0 to 0xd2203ff8)
    [  243.908237] 3fa0:                                     00000000 00000000 00000000 00000000
    [  243.916406] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    [  243.924570] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
    [  243.931496] Sending NMI from CPU 0 to CPUs 1:
    [  243.935911] NMI backtrace for cpu 1
    [  243.935918] CPU: 1 PID: 1246 Comm: balena-engine-c Tainted: G           O      4.19.49 #1
    [  243.935922] Hardware name: STM32 (Device Tree Support)
    [  243.935925] PC is at 0x16ab98
    [  243.935927] LR is at 0xad7ac
    [  243.935931] pc : [<0016ab98>]    lr : [<000ad7ac>]    psr: 80070010
    [  243.935935] sp : 02b6ee90  ip : 000000f1  fp : 02b48009
    [  243.935938] r10: 02801a40  r9 : 028729a4  r8 : 00000008
    [  243.935942] r7 : 00000083  r6 : 00000000  r5 : 0000001f  r4 : 00000008
    [  243.935946] r3 : 00000009  r2 : 02c88270  r1 : 02acc1b0  r0 : 02b48009
    [  243.935950] Flags: Nzcv  IRQs on  FIQs on  Mode USER_32  ISA ARM  Segment user
    [  243.935954] Control: 10c5387d  Table: d0be006a  DAC: 00000055
    [  243.935959] CPU: 1 PID: 1246 Comm: balena-engine-c Tainted: G           O      4.19.49 #1
    [  243.935963] Hardware name: STM32 (Device Tree Support)
    [  243.935967] [<c010f45c>] (unwind_backtrace) from [<c010bb7c>] (show_stack+0x10/0x14)
    [  243.935971] [<c010bb7c>] (show_stack) from [<c087cfac>] (dump_stack+0xb4/0xc8)
    [  243.935976] [<c087cfac>] (dump_stack) from [<c08832a8>] (nmi_cpu_backtrace+0xc0/0xc4)
    [  243.935980] [<c08832a8>] (nmi_cpu_backtrace) from [<c010e520>] (handle_IPI+0xd8/0x200)
    [  243.935985] [<c010e520>] (handle_IPI) from [<c043d730>] (gic_handle_irq+0x8c/0x90)
    [  243.935989] [<c043d730>] (gic_handle_irq) from [<c0101df0>] (__irq_usr+0x50/0x80)
    [  243.935992] Exception stack(0xd0dcffb0 to 0xd0dcfff8)
    [  243.935997] ffa0:                                     02b48009 02acc1b0 02c88270 00000009
    [  243.936002] ffc0: 00000008 0000001f 00000000 00000083 00000008 028729a4 02801a40 02b48009
    [  243.936006] ffe0: 000000f1 02b6ee90 000ad7ac 0016ab98 80070010 ffffffff
    [  243.936831] Kernel panic - not syncing: hung_task: blocked tasks
    [  244.093726] CPU: 0 PID: 25 Comm: khungtaskd Tainted: G           O      4.19.49 #1
    [  244.101268] Hardware name: STM32 (Device Tree Support)
    [  244.106429] [<c010f45c>] (unwind_backtrace) from [<c010bb7c>] (show_stack+0x10/0x14)
    [  244.114139] [<c010bb7c>] (show_stack) from [<c087cfac>] (dump_stack+0xb4/0xc8)
    [  244.121347] [<c087cfac>] (dump_stack) from [<c011c334>] (panic+0xf4/0x26c)
    [  244.128213] [<c011c334>] (panic) from [<c01b1e08>] (watchdog+0x2e0/0x3d4)
    [  244.134991] [<c01b1e08>] (watchdog) from [<c0139e78>] (kthread+0x148/0x150)
    [  244.141938] [<c0139e78>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
    [  244.149139] Exception stack(0xd2203fb0 to 0xd2203ff8)
    [  244.154181] 3fa0:                                     00000000 00000000 00000000 00000000
    [  244.162349] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    [  244.170514] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
    [  244.177128] CPU1: stopping
    [  244.179813] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G           O      4.19.49 #1
    [  244.187188] Hardware name: STM32 (Device Tree Support)
    [  244.192328] [<c010f45c>] (unwind_backtrace) from [<c010bb7c>] (show_stack+0x10/0x14)
    [  244.200054] [<c010bb7c>] (show_stack) from [<c087cfac>] (dump_stack+0xb4/0xc8)
    [  244.207265] [<c087cfac>] (dump_stack) from [<c010e620>] (handle_IPI+0x1d8/0x200)
    [  244.214652] [<c010e620>] (handle_IPI) from [<c043d730>] (gic_handle_irq+0x8c/0x90)
    [  244.222209] [<c043d730>] (gic_handle_irq) from [<c0101a0c>] (__irq_svc+0x6c/0xa8)
    [  244.229671] Exception stack(0xd3085f78 to 0xd3085fc0)
    [  244.234711] 5f60:                                                       00000000 00051f90
    [  244.242884] 5f80: d3af548c c01179e0 ffffe000 c15064b0 c15064f0 00000002 c153e822 c0a96af0
    [  244.251049] 5fa0: 00000000 00000000 c1510740 d3085fc8 c0108b3c c0108b40 60070013 ffffffff
    [  244.259220] [<c0101a0c>] (__irq_svc) from [<c0108b40>] (arch_cpu_idle+0x38/0x3c)
    [  244.266604] [<c0108b40>] (arch_cpu_idle) from [<c0146e80>] (do_idle+0x118/0x158)
    [  244.273986] [<c0146e80>] (do_idle) from [<c0147180>] (cpu_startup_entry+0x18/0x20)
    [  244.281544] [<c0147180>] (cpu_startup_entry) from [<c010242c>] (__enable_mmu+0x0/0x14)
    [  244.289452] Rebooting in 1 seconds..

Hello Anders, I certainly have not seen that behavior before, so I have pinged a few folks to see if they have any insight to provide. Which STM32MP1 board is this you are using, the Discovery Kit?

Hi, the board is a custom board with a Ka-Ro QSMP-1570 module.

The image is built on Yocto Thud.

Hi Anders, it turns out, this is actually the kernel detecting hung tasks as configured in Kconfig.debug « lib - kernel/git/torvalds/linux.git - Linux kernel source tree, so you might be able to build your image with that flag enabled in your kernel configuration or overridden in your recipe.

However, you might be best served investigating the source of the hangs, as that could cause issues later if they are not truly addressed. Hope that helps, thanks.

Hi, I understand that the problem is that one task is taking to much time, but from the dump I don’t understand which one to investigate.

Hi Anders, one method that may work to help narrow down the trouble could be to increase the hung kernel task time, and lower the amount that can be cached (thus making it faster to write the data) with the following:

sysctl -w kernel.hung_task_panic=300
sysctl -w vm.dirty_background_ratio = 5 
sysctl -w vm.dirty_ratio = 10

Hope that helps!

Hi,

The problem was that my new board is missing sd card detection pin on a empty sdcard socket. Balena is running from a eMMC instead.

I tried to fix this by delete the cd-gpios property in the device tree and tried with and without broken-cd instead, but that didn’t help.

I solved it temporary by changing cd-gpios from GPIO_ACTIVE_LOW to GPIO_ACTIVE_HIGH. But I don’t think it’s a solution and would prefer to activate pull-up bias on that pin instead as the pin is now floating.

So, how do I activate pull-up on that pin or is there a better way to solve it?
I can change the hardware in later rev. but I need this board until then.

The section look like this:

&sdmmc1 {
arm,primecell-periphid = <0x10153180>;
pinctrl-names = “default”, “opendrain”, “sleep”;
pinctrl-0 = <&sdmmc1_pins_mx>;
pinctrl-1 = <&sdmmc1_pins_opendrain_mx>;
pinctrl-2 = <&sdmmc1_pins_sleep_mx>;
bus-width = <4>;
vmmc-supply = <&reg_3v3>;
cd-gpios = <&gpiob 7 GPIO_ACTIVE_LOW>;
no-1-8-v;
st,neg-edge;
status = “okay”;
};

Hi, glad to learn you have found the root cause of the problems. The broken-cd attribute is what is usually used to inform the kernel that there is no card detection and polling should be used. It’s strange that this is not working in your case.
To answer your question, the IOMUX pad settings are usually configured from the iomux device tree entries, which in your case are the sdmmc1_pins_mx, sdmmc1_pins_opendrain_mx and sdmmc1_pins_sleep_mx for the different CPU states.

Now that it seems your BalenaOS porting is nearly finished, what are your long term plans for this board? A custom OS will not be able to use all the features of balenaCloud, so let us know if you are thinking about adding this board to balenaCloud so it appears in your dashboard. Basically you could add the board as a community supported board (see Device Types and Versioning - Balena Documentation) and we would work with you in preparing a repository under the balena-os github organization where you could PR your changes.

Hello. Just wanted to ping and see how this is going? Our OS’s team would be happy to work with you to turn this into a community supported board.