Raspberry Pi 4 - Keeps restarting upload

Hi, I think you may benefit from this issue being fixed: https://github.com/balena-os/balena-raspberrypi/issues/383

Here is a temporary PR for this issue: https://github.com/balena-os/balena-raspberrypi/pull/407 but it’s not really fixing the issue now as we are waiting for the official firmware to fix it.

You can check the latest beta firmware from https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=250990 and try to manually apply it on your machine to check if it fixes your problem.

Amazing

I will try that out and let you know…

Tim

Hi

I have applied this, and no change I am afraid - I have al tried different network and different networks (directly connected to the router).

When I connect via ssh to the pi and try and do things on this pi, it occasionally freezes for 20 seconds at a time. I have reflashed onto rapsbian buster and have no delays like this on buster (Same Pi and Same SD Card)

I have set the same images to be deployed on a pi3…

UUID c7ea84de408c93b1e80a68c38a65c9d2

This also has restart issues. I have tested on 2 different pi 3’s and 3 different card…

Its fine with small images but as soon as the images get large it all seems to stop working.

Tim

Hi Tim, would it be possible to bring the device online so our device engineers can take a look at the current state? Thanks.

Hi

Just bringing the device online now…

This is the PI3 which is in the failing to download image state…

Thanks

Tim

c7ea84de408c93b1e80a68c38a65c9d2

Actually - at last its managed to load all the images…

Took nearly a day!!!

Can you look at the logs to see what was happening as will need to do lots more deployments…

Thanks

Tim

Hi Tim, interesting that you are seeing this on both a 4 and a 3. This makes me think it could be an issue in our 64 bit OS builds. I’ll update you when I know more!

Thanks

Im going to redeploy one of the containers now - Will be interesting to see how it behaves.

For info - When images are downloading, if I ssh into the pi, and issue commands (even like ls), then I get many times where it blocks for 20 seconds or so before getting a response…

Looks like something is blocking the OS.

Hope this helps.

Tim

For info - The device also keeps going offline…

It was offline this morning, and has just gone offline again…

Thanks

Tim

Thanks Tim, don’t worry about that device for now. We are going to focus on trying to reproduce the issue locally. Is your application doing anything fancy or esoteric? Would you be okay to share the Dockerfile? Thanks again.

Hi

Its not doing anything wacky, and remember its the downloads that are failing , not the running operation.

Its got private repo’s that cant be shared I am afraid…

Container 1 Mongo
Container 2 Runtime API’s, Koa, Node.js
Container 3 Authorisation Server, Koa, Node.js
Container 4 NLP Server, Snips, Python. Rust - This is the large (2.2g) deployment
Container 5 Vue Web Application, Node.js

Let me know if you need anything else…

The container are all based on your core docker containers

Container 2,3,5
FROM balenalib/generic-aarch64-debian-node:latest-run

Container 4
FROM balenalib/generic-aarch64-ubuntu:latest-run

Tim

Hi Tim,

I’ve just jumped back onto your device to check some things out, and it seems to be looking much better now. Have you changed anything at your end?

Thanks.

Nope - Nothing changed at all - It was off again this morning - I had to reboot it…. It seems to die every day …

Its only misbehaving during the download period so I think there must be something wrong in the download code that’s causing the OS to misbehave?

Tim

Also - the services all (except for mongo) show Exited or Downloaded, not Running

What does that mean?

About 10-15 minutes ago, I was logged into your device, and suddenly every single command I tried would result in an IO error. Are you able to power-cycle the device, although I’m not convinced it will actually come back online.

Afraid not- Im in the office - will power off when I get home…

Something is killing the device which is not good…

More than happy to work with you guys on this to get stable as its a really worth while project…

For info - I am head of Innovation at BJSS, and looking at this for deployment of not only our iOT solutions, but also as part of of delivery of private by design conversational AI platforms …

Tim

Hi

Ive just powered the device back on if you want to jump on to have a look why it keeps crashing…

Tim

Thanks Tim, unfortunately the device appears to be offline currently. I’ll keep checking to see if it reconnects.

It will have crashed again

Will reboot it again but not till Friday am now as I’m away

Tim