Slow download speed, causes restart of download

axlrod · February 4, 2020, 4:11pm

Hi

I have a Rpi in Brazil on a super slow connection, it starts downloading some docker images, but never manages to complete due to the supervisor getting restarted.

Is there a way to download the images manually, or prevent the supervisor from doing its healthchecks for some time?

alexgg · February 4, 2020, 4:21pm

Hi, thanks for contacting support.
We have an open request to allow for configurable healthcheck timeouts https://github.com/balena-os/meta-balena/issues/1724.
Similar issues were reported in the forums Supervisor keeps restarting during image download. You could try a similar workaround, that is, remounting the root filesystem read-write and modifying the WatchdocTimeout in the resin-supervisor.service, and re-stating the service.
Hope it helps.

axlrod · February 5, 2020, 9:31am

Thanks, I was looking at balena-engine and its timeout, but noticed root was RO.

mount -o remount,rw /
vi balena-engine.service
#change WatchdogSec=4000
systemctl daemon-reload
# Docker had completely filled up with failed downloads..
rm -rf /mnt/data/docker
reboot

I would say its not so much a healthcheck timeout increase, but a way to still healthcheck on downloads?

mbalamat · February 5, 2020, 11:40am

Hi @axlrod you are correct you have to remount / as RW in order to edit the services files. Can you clarify your second question?

axlrod · February 5, 2020, 12:17pm

Well I’m saying that docker image downloads that are clearly progressing shouldn’t cause a Watchdog Trigger on the Balena service, I don’t know how it’s checked, but I assume that is not how it was intended.

richbayliss · February 5, 2020, 3:42pm

Hey.

downloading some docker images

Can you confirm, as it’s not obvious from the thread, that you’re referencing your service images from balenaCloud here?

axlrod · February 5, 2020, 4:12pm

Correct

rahul-thakoor · February 6, 2020, 10:44am

hey @axlrod,
yes, the supervisor or balena service shouldn’t restart if a download is in progress. could you provide logs for your device or enable support access and providing the device url ?

axlrod · February 6, 2020, 11:02am

[removed uuid]

You have a week of access.

Feel free to test things/redeploy service, bear in mind the watchdog is set to 4000sec on the balena-service atm.

sradevski · February 6, 2020, 1:33pm

I was checking the device, ran the diagnostics, and it seems that the device is struggling with slow disk writes. I am not sure if that is related to the original issue, but it might point to a corrupted or low-quality SD card, so you might want to have a look at it. Regarding the original issue, what I see in the logs is HTTP 503 status from the API, which might be the reason why the supervisor restarted. Have you witnessed this behavior multiple times, or just this one time?

axlrod · February 6, 2020, 1:47pm

if the watchdog is at the default setting for timeout, and I push a new version, it will start downloading the images, and fail around 60% because it restarts… it will then restart at 0% again, it will continue to do that until the /mnt/data is full, I’m guessing it doesn’t cleanup properly.

Ereski · February 6, 2020, 5:06pm

Hi,

Looks like all services are working right now. Looking at the logs, I can see that the kernel killed a couple of its threads due to them hanging. Both threads were in codepaths that, guessing from the stacktrace, were doing some operations on the SD card. This, together with the diagnostics @sradevski referenced above, indicates that the SD card is getting overloaded.

This could very well be the root cause of the supervisor restarting: slow SD card IO while downloading an image (which is IO-intensive) triggering the kernel’s hung task checker. The supervisor would then be killed and restarted, bypassing the watchdog. However, the supervisor hasn’t restarted since the last reboot and all services are already updated, so there are no logs to confirm this hypothesis.

It would be best to replace the SD card (from our experience, the Sandisk Extreme Pro works very well). If that is not possible, you could try increasing the kernel’s hung task timeout: see http://beautifulwork.org/hung_task_timeout_secs-hung-task-timeout/ for an example of how to do that.

Please update us if this solves the issue for you.

axlrod · February 7, 2020, 9:17am

oh my issue was solved by remounting / and changing the timeout, the SD card is already a high quality high speed SD card, I think it’s more related to the download speed on that line…

nghiant2710 · February 7, 2020, 9:31am

Good to know that it works out for you. Please let us know if you need further support.

Topic		Replies	Views
Supervisor keeps restarting during image download Product support	6	1116	October 30, 2019
Supervisor restarting before end of image downloading balenaOS	2	598	February 25, 2020
Services keeps on downloading Product support	9	402	October 8, 2020
Watchdog restarts the device during a release update balenaOS raspberrypi3 , network	16	1421	July 15, 2020
Update and restarting loop on Rpi1 balenaOS	24	875	April 3, 2020

Slow download speed, causes restart of download

Related topics