Startup and Shutdown problems #balena_sense

I shutdown both Enviro+/RPi device and desktop Mac last evening. Come this morning my application failed to boot giving error message “tunnelling socket could not be established: statusCode=500”.
Device cloud logs show supervisor starting but nothing further. Even re-installed balena-sense-master on the Mac and pushed to the cloud, but no change, even after selecting restart, and with the same error message repeated.
I’ve looked through the documentation, which I find rather confusing and re-miss on this topic.
What are the correct procedures for both shutting down and starting up the project, please?

1 Like

Hi there,
It seems this is an error that is related with an older version. Could you please let me know what are your BalenaOS and supervisor versions?
You can always reboot the device from the balenaCloud dashboard, just navigate there and press the button. Just take into consideration that if you reboot your device all your previous logs will be removed, unless you have enabled persistent logging.
Georgia

@roj @georgiats - Have a look from the mid-to-bottom of this thread: Possible to get BalenaSense to display temp in Fahrenheit?

–chris

Many thanks to both @georgiats and @chrisism for responding.
BalenaOS is 2.48.0+rev1, sense version its 1.9.3 and supervisor version 10.8.0
I did have this same problem 4/5 day ago when I first started this project, with no satisfactory solution coming from the forum. So I started again from the beginning: re-flashing the 16GB SanDisk Edge A1 10 and setting up Enviro+ and Rpi0 with the cloud. It all worked perfectly - until this morning, following 12 hours down time.
I had assumed that it was a corrupted instal and said so on my previous Forum message. But now reading what @chrisism suggested on another thread, I’m beginning to think this is a Balena problem.
@nbeck quoted: I have this exact same issue with my balena sense installation. It just stops working and the supervisor is stuck in a failed to start state, even without a pi reboot or shutdown. The only way I have found to fix this is to re-image the SD Card and redeploy the application.
So that’s what I’ll do again and report back here.
Many thanks again to @chrisism and @georgiats, and also to @nbeck

1 Like

Hello, hopefully there should be an easier way to recover from this than re-flashing the application. If you have physical access to the device could you try power-cycling it and let us know if it is still having the same issue?

In my case, it is a loss of power (via cycle or even a reboot) that causes it to remain in a failed state.

So I’ve now refreshed the Balena OS image on the RPi0, connected up, powered on, and having allowed what I consider to be sufficient time for the cloud dashboard to connect - NOTHING!
Surely I do not have to set up yet another application?
And in answer to @nazrhom I did power-cycle the RPi0 but no change.
So where to next, please Balena? This is becoming boring and unnecessarily time consuming.

1 Like

Hi there, from what we typically see, a device flashed with factory OS image that isn’t coming up in the dashboard, suggests either (a) corrupted flash media; or (b) network connectivity issues,

Can you see if your device appears on your local network (via you local router DHCP tables) and if you connect a screen to it - is there video output?

We also commonly see OoM issues on RPi Zeros, since these have very little memory on them, potentially leading to Linux kernel randomly killing processes. Would be good to see the output of free -m when the device first starts, vs. a number of hours later.

#status multiple issues in the ticket.

We’ve reproduced this according to FD above and currently investigating. #pendingengineerresponse

  • re-flashed (refreshed?) device not appearing on the dashboard

… for reference, tunnelling socket could not be established: statusCode=500 error means the API can
t talk to the supervisor via the SSH/proxy/VPN, which means the supervisor is probably not running/restarting/crash-looping.

Advising user to check the obvious things first (network/media), since a re-flashed device should appear on the dashboard regardless before even beginning to install a release.

@ab77 On my end, network, media, and memory are fine.

Thanks @ab77 My routers DHCP table shows the device as not being connected. All other devices are showing connected.
How can I check the network ID and PW being correct on this device with the dashboard not being available and with running the device headless?
I have also now reflashed the device twice; giving the correct network ID and PW within the setup panel.
Do I have to create a new device?
Thanks and regards.

Hi @roj

Do you have physical access to the device, i.e. is it possible for you to remove the SD card from your device and insert it into your computer? And then check the contents of the file /resin-boot/system-connections/resin-wifi, which should contain the configured wifi SSID and PW?

Kind regards
Alida

Many thanks @AlidaOdendaal
The ~/resin-wifi SSID and PW are correct, but I’ve setup a new development device and its now functioning fine.
Kind regards,
Roj

Any update yet on this shutdown/re-start problem, please? Its 11-days now since the issue was raised by @chrisism and Balena engineers working on it confirms @georgiats
I shutdown my Balena-Sense project last night to make some changes to the hardware location, but came to re-start it and we’re back to the same old problem - it won’t restart! Its the tunnelling error showing again with error code 500.
The supervisor starts and then hangs, but the services are all running.
Cannot go into production with this problem hanging over the project.
More help required please?
Regards,
@roj

1 Like

Hi, investigating this has been quite the ride :upside_down_face:

The explanation for why this happens is that the service manager of balenaOS sees the container engine taking longer than the default threshold (likely because of the lower specs of the Pi Zero), which leads it to terminate the service. This will of course stop the application and also terminate the supervisor.
More details can be found on the meta-balena repository here: https://github.com/balena-os/meta-balena/issues/1910#issuecomment-637605110

For a quick and dirty workaround until we release a new OS version that fixes this one could do the following:

$ mount -o remount,rw /
$ vi /etc/systemd/system/balena.service.d/timeout-fix.conf
# add the following:

[Service]
TimeoutStartSec=0

$ mount -o remount,ro /
$ systemctl daemon-reload && systemctl restart balena
2 Likes

Thank you for the patch. In trying to implement the change, I’m using the terminal instance on my device page (Host OS). After I add:

[Service]
TimeoutStartSec=0

I try to exit out of vi using :x enter and cannot exit. I’m not too keen on vi as I usually use pico. If I exit after editing the file, will the code commit?

Thanks!

Hi ,
try :wq to quit the vi editor and write the changes.
Regards Thomas

Thank you, Thomas. That worked. I pushed the changes and restarted. Test to follow.

I have added the timeout fix and allowed balena to restart. After 15 minutes of allowing the system to sit, I disconnected power to the unit and reconnected. The system starts as normal with no noted deficit.

I’ve tested it three times with the same return to normal status. I’m confident it’s fixed.

Thank you for your time and attention to this matter and I genuinely appreciate the support of all involved.

–Chris

No worries, happy to help