Services not updating

We have had some issues recently with a device not able to update the services. Here are some details about the device in question. It is a fresh SD card with a pretty recent version of balenaOS.

We have tried a couple of things in order to get the services to update. I know some of the latter steps are not recommended but it was not seeming to be recoverable.

  • Enabled lock override
  • Disabled delta updates
  • Restarted a service due to be updated
  • Restarted the supervisor container (it stayed in starting state)
  • Ran a device diagnostics (there were no reported problems)
  • Removed some folders from /var/lib/balena
  • Power cycled the device

The device is now in support mode for a week. Any help is much appreciated.

Hi, I just had a quick look at the device. It has an uptime of 1 hour and all containers are up and running for about an hour too. Did you change anything? What errors am I looking for exactly?

The interface I have got suggests that the services have not updated:

image

A file has been uploaded using Jellyfish: Jellyfish

That’s strange. When I access balena dashboard I see all the services running, coherent with the output of balena ps -a in the hostOS.
Could you try refreshing the browser page / clearing caches?

Over time my default refresh has become a had refresh and that has not solved the issue. Additionally I have taken a look using another computer and see there are still services which are downloaded but not replaced.

The device is offline at the moment but in my interface I still see a discrepancy between target release and current release. Do you see something different when accessing the device?

image

Hi there, are we seeing the same device, cause I see what alexgg is seeing as well. The device has been offline for 13 hours now, can you see what’s going on and get back to us? Thanks

The device is at a colleagues house, they will power it up in the next couple of hours.

I have just checked the device page again (and tried to paste the link again but the forum warned me that the link had already been shared). It is offline at the moment but is still saying there are services downloaded and not updated.

Here is the device details page where the current and target release are different. I am sharing in order to identify why there is a discrepancy between the information seen on my dashboard and the dashboard that your team has access too.

I will post on here when the device is powered up again.

Hi there,
to try to better understand the problem, I would like to ask you if you can run the following snippet from the browser console, in order to have a greater view of what is happening:

JSON.stringify(await sdk.models.device.get("8d6e200e7e4820a8d516f17ba05b0d58", {
	$select: ['id', 'device_name'],
	$expand: {
		is_running__release: {
			$select: ['id', 'commit'],
		},
		should_be_running__release: {
			$select: ['id', 'commit'],
		},
		belongs_to__application: {
			$select: ['id', 'app_name', 'should_track_latest_release'],
			$expand: {
				should_be_running__release: {
					$select: ['id', 'commit'],
				},
			}
		}
	}
}))

this command should output a string. It would be really helpfull for us if you could copy/paste the result here.

The device is back online and here is the output to the aforementioned command.

"{"is_running__release":[{"id":1690179,"commit":"c30ce740108b5dda5d151a1bb82f3293","__metadata":{"uri":"/resin/release(@id)?@id=1690179"}}],"should_be_running__release":[{"id":1692510,"commit":"2eaed612949c770e4b3beccc819ad864","__metadata":{"uri":"/resin/release(@id)?@id=1692510"}}],"belongs_to__application":[{"should_be_running__release":[{"id":1692510,"commit":"2eaed612949c770e4b3beccc819ad864","__metadata":{"uri":"/resin/release(@id)?@id=1692510"}}],"id":1770574,"app_name":"396-ch5662-record-load-cell-ou","should_track_latest_release":true,"__metadata":{"uri":"/resin/application(@id)?@id=1770574"}}],"id":4143494,"device_name":"unhappy-device","__metadata":{"uri":"/resin/device(@id)?@id=4143494"}}"

Let me know if there is anything else I can provide. We will leave the device powered up and I think it should still be in support mode.

Hi there,
we noticed that the device was moved from one application to another. We would like to ask you a couple of questions that would help us understand the situation better:

  1. How did you push the code to your applications ?
  2. Has the same commit been pushed in both applications ?
  1. How did you push the code to your applications ?

Code is pushed to what we call the “review apps” by our CI pipeline. At the moment this uses balena deploy running on an AWS ARM instance. Building for all our services takes place on the same EC2 instance which shares it’s docker cache. Therefore, when we build a review app or build the staging app it can use the same cache.

  1. Has the same commit been pushed in both applications ?

The same commit will not have been pushed to both applications, however, there is the chance that the same container has been pushed to both applications. Given the use of a shared cache, if a cache exists for a container already it would result in a different release but the same container.

Thanks for your excellent questioning :grinning_face_with_smiling_eyes: It seems that there is no actual issue, the only issue is that the dashboard claims an update has not completed where in reality it does not need to complete.

Hello, I just wanted to let you know that we’ve been able to identify an issue which we’ve been able to reproduce locally. I’ll be looking into this more on Monday and will provide some updates then.

1 Like

Hey, I’ve resolved some issues that I encountered while troubleshooting your device when I tried to reproduce the issue locally but I’ll still need a bit more time to troubleshoot. Could you please re-enable support for balena dashboard as it will end by the time I resume in the morning. Thank you.

Thanks for your help with this. I have enabled support access for another week.

Hey the device appears to be offline. Can you let us know when it’s back up. Thanks!

Hi there. Is this still an issue, or has it been solved? Thanks!

Sorry about this, the device is at a colleagues house and they are really struggling with their internet. I will post here when the device is back online to investigate (which will not be for a number of days still). This issue is not currently blocking us as the device is not used.

thanks for letting us know, no problem!

Sorry for the delay @20k-ultra and @rcooke-warwick I have got the device back at my own home which is blessed with more reliable wifi.

The device is powered up with another week of support mode. I will aim to keep it powered up during GMT working hours while I am at my desk. Hopefully this will provide a suitable window of opportunity.

Let me know if there is anything I can do to support debugging.