stuck in deployment loop

katmai · March 13, 2020, 6:01pm

hi guys,

i have 10 pi4 devices. we pushed an update last night and 1 of our devices is stuck in an app deployment loop.

so far i checked the sd card space, i rebooted the pi, i tried stopping and restarting the container manually, but it keeps staying stuck.

i even tried to upgrade the hostOS to latest (was using 2.47.0 and now i have 2.47.1)but still nothing

the other machines are fine.
https://dashboard.balena-cloud.com/devices/8c86b19de34a413f6483a06aa02205a5/summary

Can you please help me figure this out?

Ereski · March 13, 2020, 6:03pm

Hi. Could you enable support access for that device so we can take a look? And by stuck, what do you mean?

katmai · March 13, 2020, 6:18pm

support access enabled.

i mean the device downloads the image, then by the time it reaches 99% it starts downloading the new release again and again. i think i saw it doing like 6-7 times today.

Ereski · March 13, 2020, 6:35pm

balenaEngine is getting killed by the watchdog, and when that happens with an application update it usually means that the SD card is not taking the increased load well. We recommend the SanDisk’s Extreme Pro in these cases as in our experience they work well. Do you know if all Pis are using the same SD card model?

katmai · March 14, 2020, 9:00am

yeah all the pi’s are using the same sd card model. might mean this card is about to fail or there could be something else? all pis are using the same card/pi model/same application so the load should kinda be indentical on all of them.

nghiant2710 · March 16, 2020, 9:53am

Hi,

There is something strange with your device since as I’m checking now, there’s no space left:

root@8c86b19:/mnt/data# df -h
Filesystem                         Size  Used Avail Use% Mounted on
devtmpfs                           1.8G     0  1.8G   0% /dev
/dev/disk/by-partuuid/a1fb009e-02  300M  295M     0 100% /mnt/sysroot/active
/dev/disk/by-label/resin-state      19M  226K   17M   2% /mnt/state
overlay                            300M  295M     0 100% /
tmpfs                              1.9G     0  1.9G   0% /dev/shm
tmpfs                              1.9G   37M  1.9G   2% /run
tmpfs                              1.9G     0  1.9G   0% /sys/fs/cgroup
tmpfs                              1.9G     0  1.9G   0% /tmp
tmpfs                              1.9G  276K  1.9G   1% /var/volatile
/dev/mmcblk0p1                      40M  8.2M   32M  21% /mnt/boot
/dev/mmcblk0p6                      29G   29G     0 100% /mnt/data

and looks like balenaEngine takes all the diskspace:

root@8c86b19:/mnt/data# du -h -d1
12K     ./lost+found
26K     ./resinhup
34M     ./root-overlay
66K     ./resin-data
29G     ./docker
29G     .

We can help you clean up and try to restart the engine so please let me know if we have your permissions to do so.

katmai · March 16, 2020, 12:40pm

yeah definitely looks like something filled up /mnt/data. our app hasn’t been running so it’s not that. go ahead please.

nazrhom · March 16, 2020, 6:38pm

Hello, the device seems to be recovered and is downloading the application now. As Trong mentioned there seemed to be a lot of unused docker layers, I tried performing some cleanup to specifically remove the dangling ones, but balena engine kept not being able to start on the device.
I ended up removing the supervisor and application images which finally allowed the balena service to start and begin downloading the new release correctly. Since the device had to re-download the supervisor I also took the liberty of updating it to 10.8.0 from 10.6.27, there are a lot of crucial improvements in stability and error reporting in the latest version, and we are making sure to take any chance to update devices to the new version, let me know if that is ok with you.

katmai · March 16, 2020, 6:50pm

that is very okay, thanks. should i update the supervisor on all other devices or just leave it as is?

katmai · March 16, 2020, 7:00pm

I don’t think it works. it’s still stuck in downloading . it was at 7% a minute ago and now it’s at 2%.

ab77 · March 16, 2020, 8:50pm

Hi there, we’ve taken a look at the device and it appears to be struggling with disk I/O. Running the device diagnostics show rather high disk write latency (4s to write 1mb):

Slow disk writes detected: mmcblk0: 4101.79ms / write, sample size 980673
mmcblk0p1: 3390.78ms / write, sample size 9
mmcblk0p2: 1483.47ms / write, sample size 34
mmcblk0p3: 1696.64ms / write, sample size 176
mmcblk0p5: 1677.99ms / write, sample size 10134
mmcblk0p6: 4127.64ms / write, sample size 970320

The best course of action at this point would be to replace the SD card in this device and run the health check again to verify disk write latency is reasonable.

For reference, we’ve had very few problems with SanDisk Extreme PRO cards.

katmai · March 16, 2020, 9:09pm

got it. will do that. thanks!

katmai · March 19, 2020, 6:57pm

just wanted to give an update. we didn’t do anything to the machine and it just recovered. didn’t swap the sd card. it’s awkward. you guys have any idea what could have caused it to recover?

Ereski · March 19, 2020, 7:04pm

The update might have proceeded in just the right way for your SD card to take it without raising any other problems. On the next update, keep an eye to see if this issue reappears or not.

Topic		Replies	Views
Balena-sound Update loop balenaOS balenasound , raspberrypi4	10	557	May 16, 2022
My Device is Stuck Updating! Product support support	6	449	April 9, 2020
balenaFin stuck balenaFin	26	794	January 20, 2020
PiHole with Balena - RPi SD Card dying Project help	4	925	November 29, 2019
Balena Sound stuck in uploading loop Product support	15	1400	March 30, 2020

stuck in deployment loop

Related topics