I’m having connectivity/performance issues with my devices
ports that are supposed to be hosting web applications are not responsive
shell sessions inside containers are very slow
web diagnostics don’t work at all with the following messages: (1) Error reported when querying checks data: ssh client socket error while initiating SSH connection, (2) ssh client socket error while initiating SSH connection, (3) 6x - Supervisor Current State error: tunneling socket could not be established, cause=socket hang up
I made a few changes before the device started to behave in this way:
hardware migration from Raspberry Pi 3B+ to Raspberry Pi 3B
Balena OS migration from 32bit to 64bit
base image migration for one of the containers from Ubuntu 18.04 32 bit (Bionic) to Debian 10 64 bit (Buster)
I could go back and methodically test the effect of each of them, but I wonder if it’s possible to inspect the device and find the problem more directly? Maybe it’s something as simple as insufficient RAM/CPU…
Could someone advise me on the best way to inspect the device if the web diagnostics fail?
Alternatively, an example device is 6b007f194afc8899597f986feab31193 and I’ve granted support access to it for a week! Will just need to refresh some secrets from env vars once this is done!
These errors suggest the device hasn’t boot up correctly, specifically the Supervisor doesn’t seem to be starting up. When you say “Balena OS migration from 32bit to 64bit” how did you go about doing that?
Hey, the best way to check manually would be to use the web based terminal to connect to the device’s host os where there are some basic tools available, I took a quick look at the device you shared and it does seem to be very heavily loaded (Load average: 20.58 10.74 4.18) which is almost certainly the cause of the slowness
@dfunckt, I created a whole new application with a different base platform. I also used 64 bit images for both containers:
@_Page, it certainly feels like it! Thanks for confirming it!
Some questions that come to my mind:
(1) Do you think it would be possible to add basic resource utilisation graphs to Balena dashboard?
This feels like basic tooling for managing multiple VMs in the cloud, intuitively managing multiple edge devices would have similar needs. Having to log in to host OS is time consuming and problematic when the device is indeed slow.
(2) Out of the 3 changes I made (downgrading 3B+ to 3B, replacing Ubuntu 18.04 with Debian 10, migrating the application to 64 bits), does any of them strike you as a potential culprit?
I’ll try reverting each of them, but I wonder if they can be reasoned about logically before doing the experiments.
(3) What happens to images generated from Dockerfiles containing tags %%BALENA_MACHINE_NAME%% when a device is of a different machine type than the application default one?
Unrelated to my problem, but @dfunckt’s question piqued my interest in it…
So is BALENA_MACHINE_NAME substituted correctly even if the arch of the individual device is different than the default arch of the app?
Yeah – I know that 3B is less performant than 3B+ (1.4 GHz -> 1.2 GHz CPU clock speed). My question was which of the 3 things could have contributed to the performance drop in a major way. I guess I’ll experiment with that & revert each of the 3 changes separately!
My question was which of the 3 things could have contributed to the performance drop in a major way
I would start by comparing performance on 3B+. And, if there is no significant difference, continue with the OS and image arch checks.
So is BALENA_MACHINE_NAME substituted correctly even if the arch of the individual device is different than the default arch of the app?
BALENA_MACHINE_NAME is substituted with the default device type for your app. And this is the reason we don’t allow a full mix of different device types under an app. The restrictions we put on what devices you can add to an application with some specific default type make sure that images built from docker file templates will work on all devices on the fleet. So, we allow hybrid fleets with compatible architectures only.