Raspberry Pi 4 - Keeps restarting upload

Deploys to my Raspberry Pi 4 (Raspberry Pi 4 (using 64bit OS) (BETA)) keep restarting…

If I deploy one image, all is ok, but when I try and deploy multiple images (including 1 large image), the upload of the image never completes, and just goes round and round in a loop uploading the image

Hi there, that sounds strange indeed, can you enable support access and share the UUID of the device with us? You can do it either in a PM or just paste it here. We will have a look at the logs and see what might be the cause of this.

Thanks - I will get a device setup that is reloading constantly for you (currently I have had to deploy by adding one image at a time)… and then send you the UUID…

Much appreciated

Hi

The UUID of the device is
4f61308f597ec8a9a736dd634e49fbbe

Its in restarting mode at the moment - gets 1/2 way there, then restarts all containers at the same time…

Hi @walpoletim we’ve been taking a look at your device but it seems incredibly slow to navigate and look at log files. Do you have another SD card you could try?

We’ve run the diagnostic checks on your device and the check_write_latency check is failing, showing slow disk writes. Additionally there are the following errors seen when looking at dmesg:

[167355.950212] INFO: task kworker/2:3:6610 blocked for more than 120 seconds.
[167355.957749]       Tainted: G         C        4.19.66 #1
[167355.964054] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[167355.972792] kworker/2:3     D    0  6610      2 0x00000028
[167355.978810] Workqueue: events_freezable mmc_rescan
[167355.984161] Call trace:
[167355.987126]  __switch_to+0xa8/0xe8
[167355.991862]  __schedule+0x254/0x850
[167355.995572]  schedule+0x38/0x98
[167355.998878]  __mmc_claim_host+0xb8/0x200
[167356.002980]  mmc_get_card+0x38/0x48
[167356.006632]  mmc_sd_detect+0x24/0x90
[167356.010374]  mmc_rescan+0xd0/0x370
[167356.013916]  process_one_work+0x1ec/0x458
[167356.018115]  worker_thread+0x48/0x430
[167356.021927]  kthread+0x130/0x138
[167356.025351]  ret_from_fork+0x10/0x1c

Give another SD card a try and let us know what happens, I hope this helps!

Amazing Thanks - I will get the card swapped and let you know…

Tim

Hi

I have swapped the card to a new card, and still the same issue…

This is the new UUID
137e61a08fcb56db13a8fa1f9e5c72d2

Network is fine - we are getting 100Mb constant down on Fibre…

The pi is directly connected to the ethernet switch

Hi,

Can you please share which SD card make/model you are using? We recommend the Sandisk Extreme Pro sd cards.

I took a look at the device.
The balenaEngine healthcheck times out (in 6 minutes). This restarts the engine and the supervisor, which restarts the download.

The reason the balenaEngine healthcheck times out feels like a slow sd card.
e.g.

On a balenaFin (which is pi-cm3)

root@7ac2b83:~# time bash -x /usr/lib/balena/balena-healthcheck 
+ set -o errexit
+ balena info
+ balena ps
+ balena run --rm --log-driver none --network none hello-world

real    0m3.959s
user    0m0.643s
sys     0m0.122s
root@7ac2b83:~# 

The healthcheck is consistent and <5 seconds…

But on your device, the time can vary and go up quite a bit as well.

root@137e61a:~# time bash -x /usr/lib/balena/balena-healthcheck 
+ set -o errexit
+ balena info
+ balena ps
+ balena run --rm --log-driver none --network none hello-world

real    0m30.664s
user    0m0.233s
sys     0m0.108s
root@137e61a:~# time bash -x /usr/lib/balena/balena-healthcheck 
+ set -o errexit
+ balena info
+ balena ps
+ balena run --rm --log-driver none --network none hello-world

real    0m1.853s
user    0m0.247s
sys     0m0.082s
root@137e61a:~# time bash -x /usr/lib/balena/balena-healthcheck 
+ set -o errexit
+ balena info
+ balena ps
+ balena run --rm --log-driver none --network none hello-world

real    0m1.870s
user    0m0.225s
sys     0m0.131s
root@137e61a:~# time bash -x /usr/lib/balena/balena-healthcheck 
+ set -o errexit
+ balena info
+ balena ps
+ balena run --rm --log-driver none --network none hello-world

real    0m1.898s
user    0m0.261s
sys     0m0.088s
root@137e61a:~# 
root@137e61a:~# time bash -x /usr/lib/balena/balena-healthcheck 
+ set -o errexit
+ balena info
+ balena ps
+ balena run --rm --log-driver none --network none hello-world

real    0m1.868s
user    0m0.256s
sys     0m0.080s
root@137e61a:~# time bash -x /usr/lib/balena/balena-healthcheck 
+ set -o errexit
+ balena info
+ balena ps
+ balena run --rm --log-driver none --network none hello-world

real    0m12.315s
user    0m0.235s
sys     0m0.119s
root@137e61a:~# time bash -x /usr/lib/balena/balena-healthcheck 
+ set -o errexit
+ balena info
+ balena ps
+ balena run --rm --log-driver none --network none hello-world

real    0m14.328s
user    0m0.220s
sys     0m0.114s
root@137e61a:~# time bash -x /usr/lib/balena/balena-healthcheck 
+ set -o errexit
+ balena info
+ balena ps
+ balena run --rm --log-driver none --network none hello-world

real    0m8.249s
user    0m0.236s
sys     0m0.108s
root@137e61a:~# 

The pi4 is pretty new and still going through various improvements. So it could be that a firmware bump fixes some sd card read/write. Or it could be the make/model of your sd cards.

Regards
ZubairLK

Hi

The Disks are Sandisk extreme (34 gb)

TIm

Hmm. This is strange…

Any thoughts would be good…

Is there any way of increasing the timeouts…??

Having to deploy container by container is a real pain.

Thanks

Tiim

Increasing the timeout at runtime is something on the feature/issue list. but not implemented yet

I’ve stopped supervisor and stopped the mongo container.
Our diagnostics still reports slow write latency.

check_write_latency	 Failed	Slow disk writes detected: mmcblk0: 4235.17ms / write, sample size160009 mmcblk0p5: 3593.36ms / write, sample size85 mmcblk0p6: 4236.77ms / write, sample size159875

Still feels like something in this pi4 firmware/kernel/sd card is funny.

Could it be a fake sd card?

HI

Unlikely - It was purchased in the UK from a camera shop…

As I said I have used this in the PI3, and have no issues…

Would it be worth looking at the Pi3’s logs to see if it showing the same issue…

I have put the old card from the Pi4 into the pi3 so you should be able to see how those images came down this morning…

This is the pi3 uuid
c57355a26f73cb150eb3e9001690bb20

Can you push the same mongo application to this pi3?

At this point, I suspect it could be a pi4 firmware issue that I’m unaware off. Large network download was something that was being fixed. But I can’t remember when that fix went in

ahhh - that makes sense - it fails more with large containers (one is 2gb as it has rust, python,scipy, numpy and snips_nlp…

What options do we have with the firmware …??

Thanks for your help?

Pushing Mongo now…

Hi there,

Please let us know the results of your test on the RPi3 so we can continue helping you debug this issue.