Unresponsive Raspi 3B on 64bit system – how to best diagnose it?

Hey!

I’m having connectivity/performance issues with my devices

  • ports that are supposed to be hosting web applications are not responsive
  • shell sessions inside containers are very slow
  • web diagnostics don’t work at all with the following messages: (1) Error reported when querying checks data: ssh client socket error while initiating SSH connection, (2) ssh client socket error while initiating SSH connection, (3) 6x - Supervisor Current State error: tunneling socket could not be established, cause=socket hang up

It’m running this application: https://github.com/kkom/balena-unifi-controller on a 64 bit Raspberry Pi 3B.

I made a few changes before the device started to behave in this way:

  • hardware migration from Raspberry Pi 3B+ to Raspberry Pi 3B
  • Balena OS migration from 32bit to 64bit
  • base image migration for one of the containers from Ubuntu 18.04 32 bit (Bionic) to Debian 10 64 bit (Buster)

I could go back and methodically test the effect of each of them, but I wonder if it’s possible to inspect the device and find the problem more directly? Maybe it’s something as simple as insufficient RAM/CPU…

Could someone advise me on the best way to inspect the device if the web diagnostics fail?

Alternatively, an example device is 6b007f194afc8899597f986feab31193 and I’ve granted support access to it for a week! Will just need to refresh some secrets from env vars once this is done! :slight_smile:

Thanks a lot in advance!

Konrad

I’ve also published a few extra ports on the container:

These errors suggest the device hasn’t boot up correctly, specifically the Supervisor doesn’t seem to be starting up. When you say “Balena OS migration from 32bit to 64bit” how did you go about doing that?

1 Like

Hey, the best way to check manually would be to use the web based terminal to connect to the device’s host os where there are some basic tools available, I took a quick look at the device you shared and it does seem to be very heavily loaded (Load average: 20.58 10.74 4.18) which is almost certainly the cause of the slowness

1 Like

Thanks @dfunckt & @_Page!

@dfunckt, I created a whole new application with a different base platform. I also used 64 bit images for both containers:

@_Page, it certainly feels like it! Thanks for confirming it!

Some questions that come to my mind:

(1) Do you think it would be possible to add basic resource utilisation graphs to Balena dashboard?

This feels like basic tooling for managing multiple VMs in the cloud, intuitively managing multiple edge devices would have similar needs. Having to log in to host OS is time consuming and problematic when the device is indeed slow.

(2) Out of the 3 changes I made (downgrading 3B+ to 3B, replacing Ubuntu 18.04 with Debian 10, migrating the application to 64 bits), does any of them strike you as a potential culprit?

I’ll try reverting each of them, but I wonder if they can be reasoned about logically before doing the experiments.

(3) What happens to images generated from Dockerfiles containing tags %%BALENA_MACHINE_NAME%% when a device is of a different machine type than the application default one?

Unrelated to my problem, but @dfunckt’s question piqued my interest in it…

Hey,

The BALENA_MACHINE_NAME is substituted in the dockerfile, so as long as the arch-type is the same/compatible it would be fine.

If your issue is performance then the drop from the Pi 3B+ to the 3B wouldn’t help; the 3B+ is a faster chip.

The idea of resource graphs is an interesting one, I will make an issue for the product team to look at, thanks.

Hey!

So is BALENA_MACHINE_NAME substituted correctly even if the arch of the individual device is different than the default arch of the app?

Yeah – I know that 3B is less performant than 3B+ (1.4 GHz -> 1.2 GHz CPU clock speed). My question was which of the 3 things could have contributed to the performance drop in a major way. I guess I’ll experiment with that & revert each of the 3 changes separately!

Thanks!

Best,
Konrad

My question was which of the 3 things could have contributed to the performance drop in a major way

I would start by comparing performance on 3B+. And, if there is no significant difference, continue with the OS and image arch checks.

So is BALENA_MACHINE_NAME substituted correctly even if the arch of the individual device is different than the default arch of the app?

BALENA_MACHINE_NAME is substituted with the default device type for your app. And this is the reason we don’t allow a full mix of different device types under an app. The restrictions we put on what devices you can add to an application with some specific default type make sure that images built from docker file templates will work on all devices on the fleet. So, we allow hybrid fleets with compatible architectures only.

1 Like

Brilliant! Thanks a lot for this detail! :slight_smile:

A little update for everyone who helped me here – the issues wasn’t in 64 bit system, 3B downgrade or migration to Buster!

I created a new topic to discuss my findings without the confusion of our earlier exchange here: