Delta size not shown with every "balena push" to the cloud

Anyone from the Balena team have any thoughts on this? This is really slowing down our development process. Builds are taking 5-10x longer than they should, so trying to iterate quickly on a bug is basically an all day task.

Hey guys,

Wanted to follow up on this again. Any thoughts here - @shaunmulligan, @sradevski, @dfunckt, anyone?

Creating the “secondary” deltas is killing our build times and making it really difficult to quickly iterate and debug issues. For example, I’m pushing tiny changes to some Python code to a single device right now. They should have deltas of a few KB, but each build ends up creating and uploading 60 MB “secondary” deltas because some other devices happen to be powered on and running an old release.

I’m not working on those devices and the release I’m creating will never go on them, but the build system is generating the deltas anyway and I can’t stop it. As a result, a build that should finish in less than a minute is taking 7-10 minutes. That means that if I want to try a bunch of things quickly it takes hours instead of minutes.

Is there any hope of fixing this, or at least adding a CLI option to disable it?

Thanks,

Adam

Hey, apologies for completely ignoring your last two messages but there was an internal issue preventing this conversation from being surfaced in our internal support system. I don’t have the answer to your question right now, but I’ve made sure someone will get back to you here to discuss.

1 Like

Awesome, thanks. I appreciate it.

@dfunckt any thoughts or progress on this? The long builds slow our developers down tremendously when they’re debugging. Is there any way to disable automatic delta generation maybe?

Bumping this ticket. @dfunckt any thoughts on this?

@adamshapiro0 can you help me better understand your use case? You have a fleet of devices that you only use for testing and you iterate on releases but don’t care to have them deployed. Is that correct?

To answer a few of your questions – indeed, at build-time we’re creating deltas between two sets of releases: one between the latest release and the one just built, and one between the release most online devices are on and the one just built. Identical images should yield deltas of zero size – if you’re observing something different then either the images aren’t identical, or there’s a bug on our side.

A key thing to mention is that the cloud build and deployment pipeline optimises for production delivery and in this sense producing deltas at build time makes sense – it is optimising the time devices need to receive the update by exchanging time to complete the release – and I can see how it can be counter-productive when used for development. What forces you to use this workflow? Have you tried “local mode”?

(This conversation remains invisible to our support system and only by accident I found your replies. I’ll see if we can sort it on our end and I’ll also keep an eye for your replies, but please do start a new conversation if we seem unresponsive. Apologies.)

@dfunckt that is correct. We have a number of test vehicles, and we can deploy test builds to one or many of them at any time for development. Those test builds never get pushed directly to customers. We also have a number of bench devices of course, but many of those are operated remotely much of the time.

Local mode actually came up on another thread (Release builds getting "cancelled" unexpectedly. - #18 by adamshapiro0) and unfortunately it doesn’t work for our use case. Copying from that post:

  • Devices must be available on your local network
    • Our devices are often deployed in moving vehicles, and the developer testing against it may not be in the vehicle
    • We often test a single build on multiple devices and we wouldn’t be able to simply pin the devices to an existing dev build. Unless everyone is very careful, this can potentially be error prone or lead to inconsistencies if someone accidentally builds differently, etc., which could be difficult to recognize and track down
  • Environment variables from the cloud do not apply
    • We use the environment variables for a number of things, including turning on/off some development settings on the fly. Having to manually set them in the docker compose file to be in sync with the settings in the cloud every time we do a quick development test isn’t really feasible and is error prone
    • To deploy on multiple devices the developer would then have to set the compose file for each device by hand and make sure they got all the settings right, which takes time and is easy to get wrong, and then run separate builds for each of those devices

Our preference would definitely be to use the Balena cloud build servers for development builds. Worth mentioning that the Balena documentation specifically recommends using the remote build servers for development:

balena push is the recommended method for deployment and development on the balenaCloud platform.

Ideally, we would love to have an option for or something balena push to simply indicate to the build pipeline how it should generate deltas. Not sure what it would take to implement, but my preference would be to generate one delta since the “last build generated by this user”. That way, someone iterating quickly on a group of devices would generate only the delta they needed, and the delta sizes printed to the console at the end of the build would reflect the actual size of the delta to their devices instead of the size of a hypothetical push to customer devices, which will never occur. That would certainly be my preference.

Alternatively, honestly an option to simple disable automatic delta generation would work as well. Customer releases don’t happen nearly as often as development builds so we’d rather not waste the time (and Balena’s cloud storage for the deltas that will never be used). The reality is that first customer to the gate to download a new release probably won’t even notice if they wait slightly longer than usual or than other customers, and if that really became a problem we could always deploy the release to our test vehicles immediately to prevent that. In practice, we always deploy official releases to our vehicles so that would probably be a non-issue.

Thanks for this context @adamshapiro0!

Quick question: is this set of test/dev devices in the same application as the production ones? If they aren’t, a short term workaround would be to disable deltas for these devices, but if they are this won’t work. I don’t want to steer you further out of the “happy path” though, so I’ll share your use case with the team regardless and see what comes out of it and let you know.

Yes, they are in the same application. Most of them operate on cell though, so we’d really prefer not to disable deltas since that would mean a fair amount of wasted bandwidth and particularly for the development devices the software updates might be quite frequent. We certainly don’t wan to disable them application-wide since that would mean large pushes to customer devices as well.

Even if we did disable deltas for specific devices though, I’m not sure how that would address the problem? balena push isn’t aware of which device you’re pushing to, so wouldn’t it end up generating the same two deltas anyway?

Out of curiosity though, how do you disable deltas? The enabling delta updates section in the documentation doesn’t actually tell you how to enable them, just that they are enabled by default. Seems like an oversight.

If the application only contained those test devices and deltas were disabled for them, then during build we’d assume there’s no point in creating deltas at all (either one of the two we create) as they won’t be used.

Deltas must be disabled for all online devices individually for the builder to skip generation. We think that deltas is ultimately an implementation detail that users need not be concerned about and just rely on the platform to deliver updates as fast, efficiently and reliably as possible. Seen from the perspective I mentioned above that the build+deploy pipeline is really optimising for delivery, then it follows that disabling deltas is suboptimal. That said, your use case isn’t currently fully covered by the tools we provide to aid development, and you’re forced to go through this pipeine, so we’ll need to come up with a way to accomodate this use case. We are working to improve the time it takes for deltas to be generated but we may also find that adding the ability to punt delta generation makes sense now that we have a concrete use case like yours.

Ok, I’m still a little clear on this though:

I didn’t see any options in the dashboard to disable deltas. Is that what RESIN_SUPERVISOR_DELTA is?

In any case, we really want our test devices to be in the same application since we always deploy official releases to at least some test vehicles, and wouldn’t be able to do that if they were in a different application (at least not without having to move them back and forth between applications every time, which is a not really a great option). From our perspective, an “application” represents a “product”, including test devices for that product.

Absolutely, I didn’t want to suggest you should split the fleet in two apps – that’s what I meant when I said I don’t want to steer you further away from the happy path. If you did use separate apps however, the workaround would work until we sort the issue natively.

Found the answer. Looks like RESIN_SUPERVISOR_DELTA is the way to disable them, but the documentation was removed from the configuration page at some point, not sure why. The old version of the page currently in the google cache lists that variable and some others:

@dfunckt one related thought: is there any way via CLI or whatever to explicitly A) create a delta between release A and B if it doesn’t already exist, and then B) print out the size of a delta from A to B? This would be really useful when a customer asks “how big of a download will the next update be?” This is especially true:

  1. For devices that may be running older releases currently so the deltas printed for the current “most popular build” might not be relevant
  2. If delta generation fails during the build so nothing prints at all (just happened to us for our latest release)
  3. If you do end up making changes to the automated delta generation, that way we can still ultimately know how big of a download it will be no matter when it was built and by who

A lot of our customers operate on cell most of the time, and may want to be able to decide if they want to do an update and use their SIM bandwidth, or wait until they have a wifi/ethernet connection available. Right now we don’t have any good way of determining how big of a download would be required for a given device.

Also, for the first vehicle we put the new release on (the one that failed to generate deltas), the download took a really long time. The console output seems to indicate, as expected, that the deltas didn’t exist, but then it’s very unclear if it generated them automatically or just downloaded complete images:

01.04.21 12:19:22 (-0400) Downloading delta for image 'registry2.balena-cloud.com/v2/1cc981aa6041465e12bb6e2970288255@sha256:a152bb68115198fe8e468057330f4728f22ee5ddb2d2e479cb2205184d241860'
01.04.21 12:19:52 (-0400) Failed to download image 'registry2.balena-cloud.com/v2/1cc981aa6041465e12bb6e2970288255@sha256:a152bb68115198fe8e468057330f4728f22ee5ddb2d2e479cb2205184d241860' due to '(HTTP code 404) no such image - no such image: registry2.balena-cloud.com/v2/1cc981aa6041465e12bb6e2970288255:delta-b8f8c9665c8247ca: No such image: registry2.balena-cloud.com/v2/1cc981aa6041465e12bb6e2970288255:delta-b8f8c9665c8247ca '
01.04.21 12:20:14 (-0400) Downloaded image 'registry2.balena-cloud.com/v2/8bcba24db3dca5c8314f8069f56ae4c6@sha256:5f7c0cffb94f42315ddfd605c32b1807cec1dd8645581aa979477acfe5b7d461'

Downloading full images is really bad, especially if it happens over cell. Some of our containers are very large (200 MB+).

@adamshapiro0 we have ongoing work to enable clients such as the Dashboard, CLI and SDK to generate deltas between arbitrary releases. This is part of a somewhat large refactor but we should have news in the coming weeks.

It may feel unintuitive but it’s impossible to get the size of a delta (or even an estimate) without first generating it. Having the ability to pregenerate any delta, and therefore get its size, should help though.

Can you please reach out via private support chat from the Dashboard so we can take a look?

@dfunckt That makes sense and is pretty much what I had figured - hopefully the new delta generation changes will help at least. I’m very interested to know more about what you’re changing.

FYI I reached out via the support chat on the dashboard but to be honest, so far it’s not a good experience. It’s tiny (30 chars wide and can’t be resized, can be resized vertically but max of ~15 lines before it goes off the bottom of the screen) so it’s horrible to type in. There are also no sounds, desktop notifications, or anything, so I never know when there’s a response if I’m working on something. That wouldn’t be a big deal if the responses were immediate, which is what a live chat is usually good for, but so far it’s been an hour between responses. The chat feature leaves a lot to be desired compared with the forum, or even just email, where you can get asynchronous notifications of new posts. I’ve never needed to use chat before for stuff like this - not really sure why the forum isn’t the preferred solution?

Hey. I registered your feedback about the less-than-ideal support widget user experience. Thanks for that!

Like my colleague said, we’ll keep you posted about the news regarding deltas.