Safe way to update over poor connection

0xff · May 7, 2020, 4:32pm

Hello,

We have deployed to the field multiple instances of a 3 service application running on rPi0. In our lab we have had no issues running these apps or updating them. However, in the field where the connectivity is poor, we have encountered balena engine and supervisor failures, and a going hypothesis is that perhaps the poor connectivity results in increased resource usage, which causes services to fail. This was previously discussed Balena engine start failure.

Per the recommendations Balena engine start failure we have trimmed our application to be single-service. However, we are now encountering a different problem. When we initiate an update, the system deletes the two services that are no longer present in the new application, tries to update to the new application, and repeatedly fails, presumably because of the poor connection. Now I try to pin back to the previous release, but then it tries to download the images for the 2 services that it had deleted, and that fails too. This is a known problem How to stop infinite download loop.

Net result is that the system is now in an unstable condition - can’t go forward to the single-service app and can’t go back to the previous 3-service app. Device ID is 71b132b65186d1c80107c86a5af6cd35 and support is granted for reference. I can connect to the Host via web terminal w/o issues.

So I have 2 questions:

Is there a hack that would allow me to stop the repeat downloading so it doesn’t kill the data plan?
Is there a safe way to attempt to update other field devices so that I don’t fall into this predicament if the update fails?

jviotti · May 7, 2020, 4:49pm

Hey there! Thanks for coming to the forums. Have you seen the section of our documentation about reducing the bandwidth usage: https://www.balena.io/docs/reference/supervisor/bandwidth-reduction/? Do you think that implementing the tips discussed there could get data usage down enough that downloads wouldn’t be a problem?

0xff · May 7, 2020, 6:01pm

Thanks. I’m not concerned about data usage from request packets associated with connectivity checks etc. I’m concerned about repeated download attempts of 100MB updates, which per previous interactions with balena support, seems to be tied to the inability to maintain a connection long enough to download the update.

Unfortunately, coupled with the known issues of treating each service separately rather than part of a holistic application, and the inability to resume download after an interrupted connection, this causes the problems outlined above.

My questions relate to whether it’s possible to work around these issues.

Ereski · May 7, 2020, 7:01pm

Downloads can be resumed with delta updates: https://www.balena.io/docs/learn/deploy/delta/#delta-behavior

0xff · May 7, 2020, 7:27pm

Ah, I missed this nuance of delta updates - it seems that delta updates can handle download interruptions whereas normal updates cannot. So even if the delta update requires the entire image to be downloaded, it’s more reliable over an unstable connection?

dtischler · May 7, 2020, 8:11pm

Hi there, yes, the default behavior of the Delta updates are such that: “Delta updates are resumable, so if the connection drops or otherwise stalls, the update will resume from the last byte received.” That could help in the situation you have described, certainly.

0xff · June 11, 2020, 9:06pm

Hi,

Asking for ideas to get around an unfortunate scenario that has happened in deployed systems:

App on device has 3 containers Alpha,Beta,Gamma
I created a new App with a single container Gamma, built off a different base image so with a large delta
When updating app over an slow connection, supervisor first removed images Alpha,Beta and then tried to download newer image for Gamma. That kept failing. I tried to revert by pinning back to previous app release, but now it tried to download the old images for Alpha and Beta, which continued to fail to download, so I was left w/o working app. This is a known update issue: https://github.com/balena-io/balena-supervisor/issues/1103

I tested a possible hack around this by creating 2 small dummy images for Alpha & Beta, and then started the update. They downloaded fine. In the middle of the Gamma download, I pinned back to previous release. Once again, the system seems to try to download the old images for Alpha and Beta. This supported the issue indication that as soon as the container image is downloaded, the old one is removed.

So wondering if anyone can think of another hack around this issue. As is, we are hesitant to update any apps in the field for fear of getting stuck in no-man’s-land.

saintaardvark · June 11, 2020, 10:39pm

Hi there – thanks for the additional details about the problems you’re encountering.

As you note, the question of application update strategies is currently being tracked at https://github.com/balena-io/balena-supervisor/issues/1103, and being discussed internally. We will update that ticket when we have a solution ready to roll out.

All the best,
Hugh

0xff · June 11, 2020, 10:43pm

I guess that means there are no known workarounds or hacks and that I need to wait for the issue to be resolved.

Topic		Replies	Views
How to stop infinite download loop Product support	12	2076	April 22, 2020
Getting repeated Failed to download image updating device after pushing balenaSound to application Product support	14	2793	June 17, 2022
App Doesn't Update After Push Product support	11	1672	January 29, 2019
Update Strategy with Single Device Product support	10	448	May 20, 2022
Downloading update never finishes [Resolved] Product support	9	1841	January 10, 2019

Safe way to update over poor connection

Related topics